mmocr/docs/en/user_guides/useful_tools.md

# Useful Tools

## Analysis Tools

### Dataset Visualization Tool

MMOCR provides a dataset visualization tool `tools/analysis_tools/browse_datasets.py` to help users troubleshoot possible dataset-related problems. You just need to specify the path to the training config and the tool will automatically plots the images transformed by corresponding data pipelines with the GT labels. The following example demonstrates how to use the tool to visualize the training data used by the "DBNet_R50_icdar2015" model.

```Bash
# Example: Visualizing the training data used by dbnet_r50dcn_v2_fpnc_1200e_icadr2015
python tools/analysis_tools/browse_dataset.py configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py
```

The visualization results will be like:

<center class="half">
    <img src="https://user-images.githubusercontent.com/24622904/187611542-01e9aa94-fc12-4756-964b-a0e472522a3a.jpg" width="250"/><img src="https://user-images.githubusercontent.com/24622904/187611555-3f5ea616-863d-4538-884f-bccbebc2f7e7.jpg" width="250"/><img src="https://user-images.githubusercontent.com/24622904/187611581-88be3970-fbfe-4f62-8cdf-7a8a7786af29.jpg" width="250"/>
</center>

Based on this tool, users can easily verify if the annotation of a custom dataset is correct. Also, you can verify if the data augmentation strategies are running as you expected by modifying `train_pipeline` in the configuration file. The optional parameters of `browse_dataset.py` are as follows.

| ARGS            | Type  | Description                                                                           |
| --------------- | ----- | ------------------------------------------------------------------------------------- |
| config          | str   | (required) Path to the config.                                                        |
| --output-dir    | str   | If GUI is not available, specifying an output path to save the visualization results. |
| --show-interval | float | Interval of visualization (s), defaults to 2.                                         |

### Offline Evaluation Tool

For saved prediction results, we provide an offline evaluation script `tools/analysis_tools/offline_eval.py`. The following example demonstrates how to use this tool to evaluate the output of the "PSENet" model offline.

```Bash
# When running the test script for the first time, you can save the output of the model by specifying the --save-preds parameter
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --save-preds
# Example: Testing on PSENet
python tools/test.py configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py epoch_600.pth --save-preds

# Then, using the saved outputs for offline evaluation
python tools/analysis_tool/offline_eval.py ${CONFIG_FILE} ${PRED_FILE}
# Example: Offline evaluation of saved PSENet results
python tools/analysis_tools/offline_eval.py configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py work_dirs/psenet_r50_fpnf_600e_icdar2015/epoch_600.pth_predictions.pkl
```

`-save-preds` saves the output to `work_dir/CONFIG_NAME/MODEL_NAME_predictions.pkl` by default

In addition, based on this tool, users can also convert predictions obtained from other libraries into MMOCR-supported formats, then use MMOCR's built-in metrics to evaluate them.

| ARGS          | Type  | Description                       |
| ------------- | ----- | --------------------------------- |
| config        | str   | (required) Path to the config.    |
| pkl_results   | str   | (required) The saved predictions. |
| --cfg-options | float | Override configs. [Example](<>)   |
[Docs] Refactor docs (#409) 2021-08-25 16:41:07 +08:00			`# Useful Tools`

[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`## Analysis Tools`
[Docs] Refactor docs (#409) 2021-08-25 16:41:07 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`### Dataset Visualization Tool`
[Docs] Refactor docs (#409) 2021-08-25 16:41:07 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			MMOCR provides a dataset visualization tool `tools/analysis_tools/browse_datasets.py` to help users troubleshoot possible dataset-related problems. You just need to specify the path to the training config and the tool will automatically plots the images transformed by corresponding data pipelines with the GT labels. The following example demonstrates how to use the tool to visualize the training data used by the "DBNet_R50_icdar2015" model.
[Docs] Refactor docs (#409) 2021-08-25 16:41:07 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			```Bash
			`# Example: Visualizing the training data used by dbnet_r50dcn_v2_fpnc_1200e_icadr2015`
			`python tools/analysis_tools/browse_dataset.py configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_icdar2015.py`
[CI] Add CI (#1176) * [CI] Add CI * update init * fix lint * fix lint * fix linting * fix linting * fix linting * fix * fix * fix * fix * fix * fix * disable github ci * fix * Update .circleci/test.yml Co-authored-by: Qing Jiang <mountchicken@outlook.com> * fix * fix Co-authored-by: Qing Jiang <mountchicken@outlook.com> 2022-07-21 14:28:57 +08:00			```
[Docs] Refactor docs (#409) 2021-08-25 16:41:07 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`The visualization results will be like:`
[Docs] Refactor docs (#409) 2021-08-25 16:41:07 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`<center class="half">`
			`<img src="https://user-images.githubusercontent.com/24622904/187611542-01e9aa94-fc12-4756-964b-a0e472522a3a.jpg" width="250"/><img src="https://user-images.githubusercontent.com/24622904/187611555-3f5ea616-863d-4538-884f-bccbebc2f7e7.jpg" width="250"/><img src="https://user-images.githubusercontent.com/24622904/187611581-88be3970-fbfe-4f62-8cdf-7a8a7786af29.jpg" width="250"/>`
			`</center>`
[Feature] Add recog2lmdb and new toy dataset files (#979) * loss * fix * add img2lmdb and test files * update * add reference * fix lint * fix typo * use total_numer instead to fit mmocr's lmdbloader * reorganize and update * fix lint * update test file * refactor and update * fix test * update doc in tools * fix lint * update old lmdb test file * update * mask the unittest for recog2lmdb and use json format for label_only * remove if __name__ * fix case, doc, typo, formats * fix typos * fix docs and variable names * Apply suggestions from code review Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> * update test_loader.py and fix a bug Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> 2022-04-29 22:30:36 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			Based on this tool, users can easily verify if the annotation of a custom dataset is correct. Also, you can verify if the data augmentation strategies are running as you expected by modifying `train_pipeline` in the configuration file. The optional parameters of `browse_dataset.py` are as follows.
[Feature] Add recog2lmdb and new toy dataset files (#979) * loss * fix * add img2lmdb and test files * update * add reference * fix lint * fix typo * use total_numer instead to fit mmocr's lmdbloader * reorganize and update * fix lint * update test file * refactor and update * fix test * update doc in tools * fix lint * update old lmdb test file * update * mask the unittest for recog2lmdb and use json format for label_only * remove if __name__ * fix case, doc, typo, formats * fix typos * fix docs and variable names * Apply suggestions from code review Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> * update test_loader.py and fix a bug Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> 2022-04-29 22:30:36 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`\| ARGS \| Type \| Description \|`
			`\| --------------- \| ----- \| ------------------------------------------------------------------------------------- \|`
			`\| config \| str \| (required) Path to the config. \|`
			`\| --output-dir \| str \| If GUI is not available, specifying an output path to save the visualization results. \|`
			`\| --show-interval \| float \| Interval of visualization (s), defaults to 2. \|`
[Feature] Add recog2lmdb and new toy dataset files (#979) * loss * fix * add img2lmdb and test files * update * add reference * fix lint * fix typo * use total_numer instead to fit mmocr's lmdbloader * reorganize and update * fix lint * update test file * refactor and update * fix test * update doc in tools * fix lint * update old lmdb test file * update * mask the unittest for recog2lmdb and use json format for label_only * remove if __name__ * fix case, doc, typo, formats * fix typos * fix docs and variable names * Apply suggestions from code review Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> * update test_loader.py and fix a bug Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> 2022-04-29 22:30:36 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`### Offline Evaluation Tool`
[Feature] Add recog2lmdb and new toy dataset files (#979) * loss * fix * add img2lmdb and test files * update * add reference * fix lint * fix typo * use total_numer instead to fit mmocr's lmdbloader * reorganize and update * fix lint * update test file * refactor and update * fix test * update doc in tools * fix lint * update old lmdb test file * update * mask the unittest for recog2lmdb and use json format for label_only * remove if __name__ * fix case, doc, typo, formats * fix typos * fix docs and variable names * Apply suggestions from code review Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> * update test_loader.py and fix a bug Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> 2022-04-29 22:30:36 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			For saved prediction results, we provide an offline evaluation script `tools/analysis_tools/offline_eval.py`. The following example demonstrates how to use this tool to evaluate the output of the "PSENet" model offline.
[Feature] Add analyze_logs in tools and its description in docs (#899) * Create analyze_logs.py * Update tools.md * fix lint and typo * Update analyze_logs.py * Add arg table and demo log file * Delete line66 for lint error * fix captial letters * update doc * fix markdown indentation * Add log_analysis_demo.png to demo/resources * Add log_analysis_demo.png and two links in table * Improve epoch-based metric * fix lint error * fix lint error(tabs and spaces) * check code lints and format Co-authored-by: Mountchicken <mountchicken@outlook.com> Co-authored-by: xinke-wang <wangxinyu2017@gmail.com> 2022-04-02 22:40:39 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			```Bash
			`# When running the test script for the first time, you can save the output of the model by specifying the --save-preds parameter`
			`python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --save-preds`
			`# Example: Testing on PSENet`
			`python tools/test.py configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py epoch_600.pth --save-preds`
[Feature] Add recog2lmdb and new toy dataset files (#979) * loss * fix * add img2lmdb and test files * update * add reference * fix lint * fix typo * use total_numer instead to fit mmocr's lmdbloader * reorganize and update * fix lint * update test file * refactor and update * fix test * update doc in tools * fix lint * update old lmdb test file * update * mask the unittest for recog2lmdb and use json format for label_only * remove if __name__ * fix case, doc, typo, formats * fix typos * fix docs and variable names * Apply suggestions from code review Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> * update test_loader.py and fix a bug Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> 2022-04-29 22:30:36 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`# Then, using the saved outputs for offline evaluation`
			`python tools/analysis_tool/offline_eval.py ${CONFIG_FILE} ${PRED_FILE}`
			`# Example: Offline evaluation of saved PSENet results`
			`python tools/analysis_tools/offline_eval.py configs/textdet/psenet/psenet_r50_fpnf_600e_icdar2015.py work_dirs/psenet_r50_fpnf_600e_icdar2015/epoch_600.pth_predictions.pkl`
[Feature] Add recog2lmdb and new toy dataset files (#979) * loss * fix * add img2lmdb and test files * update * add reference * fix lint * fix typo * use total_numer instead to fit mmocr's lmdbloader * reorganize and update * fix lint * update test file * refactor and update * fix test * update doc in tools * fix lint * update old lmdb test file * update * mask the unittest for recog2lmdb and use json format for label_only * remove if __name__ * fix case, doc, typo, formats * fix typos * fix docs and variable names * Apply suggestions from code review Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> * update test_loader.py and fix a bug Co-authored-by: gaotongxiao <gaotongxiao@gmail.com> Co-authored-by: Xinyu Wang <45810070+xinke-wang@users.noreply.github.com> 2022-04-29 22:30:36 +08:00			```

[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`-save-preds` saves the output to `work_dir/CONFIG_NAME/MODEL_NAME_predictions.pkl` by default
[Feature] Add analyze_logs in tools and its description in docs (#899) * Create analyze_logs.py * Update tools.md * fix lint and typo * Update analyze_logs.py * Add arg table and demo log file * Delete line66 for lint error * fix captial letters * update doc * fix markdown indentation * Add log_analysis_demo.png to demo/resources * Add log_analysis_demo.png and two links in table * Improve epoch-based metric * fix lint error * fix lint error(tabs and spaces) * check code lints and format Co-authored-by: Mountchicken <mountchicken@outlook.com> Co-authored-by: xinke-wang <wangxinyu2017@gmail.com> 2022-04-02 22:40:39 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`In addition, based on this tool, users can also convert predictions obtained from other libraries into MMOCR-supported formats, then use MMOCR's built-in metrics to evaluate them.`
[Feature] Add analyze_logs in tools and its description in docs (#899) * Create analyze_logs.py * Update tools.md * fix lint and typo * Update analyze_logs.py * Add arg table and demo log file * Delete line66 for lint error * fix captial letters * update doc * fix markdown indentation * Add log_analysis_demo.png to demo/resources * Add log_analysis_demo.png and two links in table * Improve epoch-based metric * fix lint error * fix lint error(tabs and spaces) * check code lints and format Co-authored-by: Mountchicken <mountchicken@outlook.com> Co-authored-by: xinke-wang <wangxinyu2017@gmail.com> 2022-04-02 22:40:39 +08:00
[Docs] Useful Tools (#1349) * init useful tools * apply comments Co-authored-by: Tong Gao <gaotongxiao@gmail.com> * update link Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: liukuikun <liukuikun@sensetime.com> 2022-08-31 15:51:12 +08:00			`\| ARGS \| Type \| Description \|`
			`\| ------------- \| ----- \| --------------------------------- \|`
			`\| config \| str \| (required) Path to the config. \|`
			`\| pkl_results \| str \| (required) The saved predictions. \|`
			`\| --cfg-options \| float \| Override configs. [Example](<>) \|`