---
comments: true
---
# Seal Text Recognition Pipeline Tutorial
## 1. Introduction to Seal Text Recognition Pipeline
Seal text recognition is a technology that automatically extracts and recognizes the content of seals from documents or images. The recognition of seal text is part of document processing and has many applications in various scenarios, such as contract comparison, warehouse entry and exit review, and invoice reimbursement review.
The seal text recognition pipeline is used to recognize the text content of seals, extracting the text information from seal images and outputting it in text form. This pipeline integrates the industry-renowned end-to-end OCR system PP-OCRv4, supporting the detection and recognition of curved seal text. Additionally, this pipeline integrates an optional layout region localization module, which can accurately locate the layout position of the seal within the entire document. It also includes optional document image orientation correction and distortion correction functions. Based on this pipeline, millisecond-level accurate text content prediction can be achieved on a CPU. This pipeline also provides flexible service deployment methods, supporting the use of multiple programming languages on various hardware. Moreover, it offers custom development capabilities, allowing you to train and fine-tune on your own dataset based on this pipeline, and the trained model can be seamlessly integrated.
The seal text recognition pipeline includes a seal text detection module and a text recognition module, as well as optional layout detection module, document image orientation classification module, and text image correction module.
- [Seal Text Detection Module](../module_usage/seal_text_detection.en.md)
- [Text Recognition Module](../module_usage/text_recognition.en.md)
- [Layout Detection Module](../module_usage/layout_detection.en.mdmd) (Optional)
- [Document Image Orientation Classification Module](../module_usage/doc_img_orientation_classification.en.md) (Optional)
- [Text Image Unwarping Module](../module_usage/text_image_unwarping.en.md) (Optional)
If you prioritize model accuracy, choose a model with higher accuracy. If you prioritize inference speed, choose a model with faster inference speed. If you prioritize model storage size, choose a model with smaller storage size.
Layout Region Detection Module (Optional):
* The layout detection model includes 20 common categories: document title, paragraph title, text, page number, abstract, table, references, footnotes, header, footer, algorithm, formula, formula number, image, table, seal, figure_table title, chart, and sidebar text and lists of referencesModel | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PP-DocLayout_plus-L | Inference Model/Training Model | 83.2 | 34.6244 / 10.3945 | 510.57 / - | 126.01 M | A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L |
Model | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PP-DocLayout-L | Inference Model/Training Model | 90.4 | 34.6244 / 10.3945 | 510.57 / - | 123.76 M | A high-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using RT-DETR-L. |
PP-DocLayout-M | Inference Model/Training Model | 75.2 | 13.3259 / 4.8685 | 44.0680 / 44.0680 | 22.578 | A layout area localization model with balanced precision and efficiency, trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-L. |
PP-DocLayout-S | Inference Model/Training Model | 70.9 | 8.3008 / 2.3794 | 10.0623 / 9.9296 | 4.834 | A high-efficiency layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-S. |
Model | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PicoDet-S_layout_3cls | Inference Model/Training Model | 88.2 | 8.99 / 2.22 | 16.11 / 8.73 | 4.8 | A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S. |
PicoDet-L_layout_3cls | Inference Model/Training Model | 89.0 | 13.05 / 4.50 | 41.30 / 41.30 | 22.6 | A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L. |
RT-DETR-H_layout_3cls | Inference Model/Training Model | 95.8 | 114.93 / 27.71 | 947.56 / 947.56 | 470.1 | A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H. |
Model | Model Download Link | mAP(0.5) (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PicoDet-S_layout_17cls | Inference Model/Training Model | 87.4 | 9.11 / 2.12 | 15.42 / 9.12 | 4.8 | A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S. |
PicoDet-L_layout_17cls | Inference Model/Training Model | 89.0 | 13.50 / 4.69 | 43.32 / 43.32 | 22.6 | A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L. |
RT-DETR-H_layout_17cls | Inference Model/Training Model | 98.3 | 115.29 / 104.09 | 995.27 / 995.27 | 470.2 | A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H. |
Document Image Orientation Classification Module (Optional):
Model | Model Download Link | Top-1 Acc (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Description |
---|---|---|---|---|---|---|
PP-LCNet_x1_0_doc_ori | Inference Model/Training Model | 99.06 | 2.31 / 0.43 | 3.37 / 1.27 | 7 | A document image classification model based on PP-LCNet_x1_0, containing four categories: 0 degrees, 90 degrees, 180 degrees, and 270 degrees |
Text Image Correction Module (Optional):
Model | Model Download Link | CER | Model Storage Size (M) | Description |
---|---|---|---|---|
UVDoc | Inference Model/Training Model | 0.179 | 30.3 M | High-precision text image correction model |
Text Detection Module:
Model | Model Download Link | Detection Hmean (%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Description |
---|---|---|---|---|---|---|
PP-OCRv4_server_seal_det | Inference Model/Training Model | 98.21 | 74.75 / 67.72 | 382.55 / 382.55 | 109 | PP-OCRv4 server-side seal text detection model, with higher accuracy, suitable for deployment on better servers |
PP-OCRv4_mobile_seal_det | Inference Model/Training Model | 96.47 | 7.82 / 3.09 | 48.28 / 23.97 | 4.6 | PP-OCRv4 mobile-side seal text detection model, with higher efficiency, suitable for deployment on the edge |
Text Recognition Module:
Model | Model Download Link | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PP-OCRv4_server_rec_doc | Inference Model/Training Model | 81.53 | 6.65 / 2.38 | 32.92 / 32.92 | 74.7 M | PP-OCRv4_server_rec_doc is trained on a mixed dataset of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec. It has added the ability to recognize some traditional Chinese characters, Japanese, and special characters, and can support the recognition of more than 15,000 characters. In addition to improving the text recognition capability related to documents, it also enhances the general text recognition capability. |
PP-OCRv4_mobile_rec | Inference Model/Training Model | 78.74 | 4.82 / 1.20 | 16.74 / 4.64 | 10.6 M | The lightweight recognition model of PP-OCRv4 has high inference efficiency and can be deployed on various hardware devices, including edge devices. |
PP-OCRv4_server_rec | Inference Model/Training Model | 80.61 | 6.58 / 2.43 | 33.17 / 33.17 | 71.2 M | The server-side model of PP-OCRv4 offers high inference accuracy and can be deployed on various types of servers. |
en_PP-OCRv4_mobile_rec | Inference Model/Training Model | 70.39 | 4.81 / 0.75 | 16.10 / 5.31 | 6.8 M | The ultra-lightweight English recognition model, trained based on the PP-OCRv4 recognition model, supports the recognition of English letters and numbers. |
Model | Model Download Link | Recognition Avg Accuracy(%) | CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
PP-OCRv4_server_rec_doc | Inference Model/Training Model | 81.53 | 6.65 / 2.38 | 32.92 / 32.92 | 74.7 M | PP-OCRv4_server_rec_doc is trained on a mixed dataset of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec. It has added the recognition capabilities for some traditional Chinese characters, Japanese, and special characters. The number of recognizable characters is over 15,000. In addition to the improvement in document-related text recognition, it also enhances the general text recognition capability. |
PP-OCRv4_mobile_rec | Inference Model/Training Model | 78.74 | 4.82 / 1.20 | 16.74 / 4.64 | 10.6 M | The lightweight recognition model of PP-OCRv4 has high inference efficiency and can be deployed on various hardware devices, including edge devices. |
PP-OCRv4_server_rec | Inference Model/Training Model | 80.61 | 6.58 / 2.43 | 33.17 / 33.17 | 71.2 M | The server-side model of PP-OCRv4 offers high inference accuracy and can be deployed on various types of servers. |
PP-OCRv3_mobile_rec | Inference Model/Training Model | 72.96 | 5.87 / 1.19 | 9.07 / 4.28 | 9.2 M | PP-OCRv3’s lightweight recognition model is designed for high inference efficiency and can be deployed on a variety of hardware devices, including edge devices. |
Model | Model Download Link | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
ch_SVTRv2_rec | Inference Model/Training Model | 68.81 | 8.08 / 2.74 | 50.17 / 42.50 | 73.9 M | SVTRv2 is a server text recognition model developed by the OpenOCR team of Fudan University's Visual and Learning Laboratory (FVL). It won the first prize in the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task. The end-to-end recognition accuracy on the A list is 6% higher than that of PP-OCRv4. |
Model | Model Download Link | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
ch_RepSVTR_rec | Inference Model/Training Model | 65.07 | 5.93 / 1.62 | 20.73 / 7.32 | 22.1 M | The RepSVTR text recognition model is a mobile text recognition model based on SVTRv2. It won the first prize in the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task. The end-to-end recognition accuracy on the B list is 2.5% higher than that of PP-OCRv4, with the same inference speed. |
Model | Model Download Link | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
en_PP-OCRv4_mobile_rec | Inference Model/Training Model | 70.39 | 4.81 / 0.75 | 16.10 / 5.31 | 6.8 M | The ultra-lightweight English recognition model trained based on the PP-OCRv4 recognition model supports the recognition of English and numbers. |
en_PP-OCRv3_mobile_rec | Inference Model/Training Model | 70.69 | 5.44 / 0.75 | 8.65 / 5.57 | 7.8 M | The ultra-lightweight English recognition model trained based on the PP-OCRv3 recognition model supports the recognition of English and numbers. |
Model | Model Download Link | Recognition Avg Accuracy(%) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (M) | Introduction |
---|---|---|---|---|---|---|
korean_PP-OCRv3_mobile_rec | Inference Model/Training Model | 60.21 | 5.40 / 0.97 | 9.11 / 4.05 | 8.6 M | The ultra-lightweight Korean recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Korean and numbers. |
japan_PP-OCRv3_mobile_rec | Inference Model/Training Model | 45.69 | 5.70 / 1.02 | 8.48 / 4.07 | 8.8 M | The ultra-lightweight Japanese recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Japanese and numbers. |
chinese_cht_PP-OCRv3_mobile_rec | Inference Model/Training Model | 82.06 | 5.90 / 1.28 | 9.28 / 4.34 | 9.7 M | The ultra-lightweight Traditional Chinese recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Traditional Chinese and numbers. |
te_PP-OCRv3_mobile_rec | Inference Model/Training Model | 95.88 | 5.42 / 0.82 | 8.10 / 6.91 | 7.8 M | The ultra-lightweight Telugu recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Telugu and numbers. |
ka_PP-OCRv3_mobile_rec | Inference Model/Training Model | 96.96 | 5.25 / 0.79 | 9.09 / 3.86 | 8.0 M | The ultra-lightweight Kannada recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Kannada and numbers. |
ta_PP-OCRv3_mobile_rec | Inference Model/Training Model | 76.83 | 5.23 / 0.75 | 10.13 / 4.30 | 8.0 M | The ultra-lightweight Tamil recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Tamil and numbers. |
latin_PP-OCRv3_mobile_rec | Inference Model/Training Model | 76.93 | 5.20 / 0.79 | 8.83 / 7.15 | 7.8 M | The ultra-lightweight Latin recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Latin script and numbers. |
arabic_PP-OCRv3_mobile_rec | Inference Model/Training Model | 73.55 | 5.35 / 0.79 | 8.80 / 4.56 | 7.8 M | The ultra-lightweight Arabic script recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Arabic script and numbers. |
cyrillic_PP-OCRv3_mobile_rec | Inference Model/Training Model | 94.28 | 5.23 / 0.76 | 8.89 / 3.88 | 7.9 M | The ultra-lightweight cyrillic alphabet recognition model trained based on the PP-OCRv3 recognition model supports the recognition of cyrillic letters and numbers. |
devanagari_PP-OCRv3_mobile_rec | Inference Model/Training Model | 96.44 | 5.22 / 0.79 | 8.56 / 4.06 | 7.9 M | The ultra-lightweight Devanagari script recognition model trained based on the PP-OCRv3 recognition model supports the recognition of Devanagari script and numbers. |
Mode | GPU Configuration | CPU Configuration | Acceleration Technology Combination |
---|---|---|---|
Normal Mode | FP32 Precision / No TRT Acceleration | FP32 Precision / 8 Threads | PaddleInference |
High-Performance Mode | Optimal combination of pre-selected precision types and acceleration strategies | FP32 Precision / 8 Threads | Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) |
Parameter | Description | Parameter Type | Default Value | |
---|---|---|---|---|
input |
Data to be predicted, supporting multiple input types, required.
|
Python Var|str|list |
||
save_path |
Specify the path to save the inference results file. If set to None , the inference results will not be saved locally. |
str |
None |
|
doc_orientation_classify_model_name |
The name of the document orientation classification model. If set to None , the default model in pipeline will be used. |
str |
None |
|
doc_orientation_classify_model_dir |
The directory path of the document orientation classification model. If set to None , the official model will be downloaded. |
str |
None |
|
doc_unwarping_model_name |
The name of the text image unwarping model. If set to None , the default model in pipeline will be used. |
str |
None |
|
doc_unwarping_model_dir |
The directory path of the text image unwarping model. If set to None , the official model will be downloaded.
|
str |
None |
|
layout_detection_model_name |
The name of the layout detection model. If set to None , the default model in pipeline will be used. |
str |
None |
|
layout_detection_model_dir |
The directory path of the layout detection model. If set to None , the official model will be downloaded.
|
str |
None |
|
seal_text_detection_model_name |
The name of the seal text detection model. If set to None , the production line's default model will be used. |
str |
None |
|
seal_text_detection_model_dir |
The directory path of the seal text detection model. If set to None , the official model will be downloaded. |
str |
None |
|
text_recognition_model_name |
Name of the text recognition model. If None , the default pipeline model is used. |
str |
None |
|
text_recognition_model_dir |
Directory path of the text recognition model. If None , the official model is downloaded. |
str |
None |
|
text_recognition_batch_size |
Batch size for the text recognition model. If None , defaults to 1 . |
int |
None |
|
use_doc_orientation_classify |
Whether to enable document orientation classification. If None , defaults to pipeline initialization value (True ). |
bool |
None |
|
use_doc_unwarping |
Whether to enable text image correction. If None , defaults to pipeline initialization value (True ). |
bool |
None |
|
use_layout_detection |
Whether to load the layout detection module. If set to None , the parameter will default to the value initialized in the pipeline, which is True . |
bool |
None |
|
layout_threshold |
Threshold for layout detection, used to filter out predictions with low confidence.
|
float|dict |
None |
|
layout_nms |
Whether to use NMS (Non-Maximum Suppression) post-processing for layout region detection to filter out overlapping boxes. If set to None , the default configuration of the official model will be used. |
bool |
None |
|
layout_unclip_ratio |
The scaling factor for the side length of the detection boxes in layout region detection.
|
float|list |
None |
|
layout_merge_bboxes_mode |
The merging mode for the detection boxes output by the model in layout region detection.
|
str |
None |
|
seal_det_limit_side_len |
The side length limit for seal detection images. | int|None |
|
None |
seal_det_limit_type |
The type of side length limit for seal detection images. | str|None |
|
None |
seal_det_thresh |
The pixel threshold for detection. In the output probability map, pixel points with scores greater than this threshold will be considered as seal pixels. | float|None |
|
None |
seal_det_box_thresh |
The bounding box threshold for detection. When the average score of all pixel points within the detection result bounding box is greater than this threshold, the result will be considered as a seal region. | float|None |
|
None |
seal_det_unclip_ratio |
The expansion coefficient for seal detection. This method is used to expand the seal region, and the larger the value, the larger the expansion area. | float|None |
|
None |
seal_rec_score_thresh |
The seal recognition threshold. Text results with scores greater than this threshold will be retained. | float|None |
|
None |
device |
The device used for inference. Support for specifying specific card numbers.
|
str |
None |
|
enable_hpi |
Whether to enable high-performance inference. | bool |
False |
|
use_tensorrt |
Whether to use TensorRT for inference acceleration. | bool |
False |
|
min_subgraph_size |
The minimum subgraph size, used to optimize the computation of model subgraphs. | int |
3 |
|
precision |
The computational precision, such as fp32, fp16. | str |
fp32 |
|
enable_mkldnn |
Whether to enable the MKL-DNN acceleration library. If set to None , it will be enabled by default. |
bool |
None |
|
cpu_threads |
The number of threads used for inference on the CPU. | int |
8 |
|
paddlex_config |
Path to PaddleX pipeline configuration file. | str |
None |
Parameter | Description | Type | Default Value | |
---|---|---|---|---|
pipeline |
The name of the pipeline or the path to the pipeline configuration file. If it is a pipeline name, it must be supported by PaddleX. | str |
None |
|
config |
Specific configuration information for the pipeline (if set simultaneously with pipeline , it has higher priority than pipeline , and the pipeline name must be consistent with pipeline ). |
dict[str, Any] |
None |
|
device |
The device used for pipeline inference. It supports specifying the specific card number of the GPU, such as "gpu:0", other hardware card numbers, such as "npu:0", or CPU, such as "cpu". Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to Pipeline Parallel Inference. | str |
gpu:0 |
|
use_hpip |
Whether to enable the high-performance inference plugin. If set to None , the setting from the configuration file or config will be used. |
bool |
None | None |
hpi_config |
High-performance inference configuration | dict | None |
None | None |
Parameter | Description | Type | Options | Default Value |
---|---|---|---|---|
input |
Data to be predicted, supports multiple input types (required) | Python Var|str|list |
|
|
device |
Inference device for the pipeline | str|None |
|
None |
use_doc_orientation_classify |
Whether to use the document orientation classification module | bool|None |
|
None |
use_doc_unwarping |
Whether to use the document unwarping module | bool|None |
|
None |
use_layout_detection |
Whether to use the layout detection module | bool|None |
|
None |
layout_threshold |
Confidence threshold for layout detection; only scores above this threshold will be output | float|dict|None |
|
None |
layout_nms |
Whether to use Non-Maximum Suppression (NMS) for layout detection post-processing | bool|None |
|
None |
layout_unclip_ratio |
Expansion ratio of detection box edges; if not specified, the default value from the PaddleX official model configuration will be used | float|list|None |
|
|
layout_merge_bboxes_mode |
Merging mode for detection boxes in layout detection output; if not specified, the default value from the PaddleX official model configuration will be used | string|None |
|
None |
seal_det_limit_side_len |
Side length limit for seal text detection | int|None |
|
None |
seal_rec_score_thresh |
Text recognition threshold; text results with scores above this threshold will be retained | float|None |
|
None |
Method | Description | Parameter | Parameter Type | Parameter Description | Default Value |
---|---|---|---|---|---|
print() |
Print results to the terminal | format_json |
bool |
Whether to format the output content using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the output JSON data for better readability, effective only when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode . When set to True , all non-ASCII characters will be escaped; False will retain the original characters, effective only when format_json is True |
False |
||
save_to_json() |
Save results as a json file | save_path |
str |
The file path to save the results. When it is a directory, the saved file name will be consistent with the input file type | None |
indent |
int |
Specify the indentation level to beautify the output JSON data for better readability, effective only when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode . When set to True , all non-ASCII characters will be escaped; False will retain the original characters, effective only when format_json is True |
False |
||
save_to_img() |
Save results as an image file | save_path |
str |
The file path to save the results, supports directory or file path | None |
Attribute | Description |
---|---|
json |
Get the prediction results in json format. |
img |
Get the visualization results in dict format. |
For the main operations provided by the service:
200
, and the attributes of the response body are as follows:Name | Type | Description |
---|---|---|
logId |
string |
The UUID of the request. |
errorCode |
integer |
Error code. Fixed as 0 . |
errorMsg |
string |
Error message. Fixed as "Success" . |
result |
object |
The result of the operation. |
Name | Type | Description |
---|---|---|
logId |
string |
The UUID of the request. |
errorCode |
integer |
Error code. Same as the response status code. |
errorMsg |
string |
Error message. |
The main operations provided by the service are as follows:
infer
Obtain the seal text recognition result.
POST /seal-recognition
Name | Type | Description | Required |
---|---|---|---|
file |
string |
The URL of an image or PDF file accessible by the server, or the Base64-encoded content of the file. By default, for PDF files exceeding 10 pages, only the content of the first 10 pages will be processed. To remove the page limit, please add the following configuration to the pipeline configuration file:
|
Yes |
fileType |
integer | null |
The type of file. 0 indicates a PDF file, 1 indicates an image file. If this attribute is not present in the request body, the file type will be inferred from the URL. |
No |
useDocOrientationClassify |
boolean | null |
Please refer to the description of the use_doc_orientation_classify parameter of the pipeline object's predict method. |
No |
useDocUnwarping |
boolean | null |
Please refer to the description of the use_doc_unwarping parameter of the pipeline object's predict method. |
No |
useLayoutDetection |
boolean | null |
Please refer to the description of the use_layout_detection parameter of the pipeline object's predict method. |
No |
layoutThreshold |
number | null |
Please refer to the description of the layout_threshold parameter of the pipeline object's predict method. |
No |
layoutNms |
boolean | null |
Please refer to the description of the layout_nms parameter of the pipeline object's predict method. |
No |
layoutUnclipRatio |
number | array | null |
Please refer to the description of the layout_unclip_ratio parameter of the pipeline object's predict method. |
No |
layoutMergeBboxesMode |
string | null |
Please refer to the description of the layout_merge_bboxes_mode parameter of the pipeline object's predict method. |
No |
sealDetLimitSideLen |
integer | null |
Please refer to the description of the seal_det_limit_side_len parameter of the pipeline object's predict method. |
No |
sealDetLimitType |
string | null |
Please refer to the description of the seal_det_limit_type parameter of the pipeline object's predict method. |
No |
sealDetThresh |
number | null |
Please refer to the description of the seal_det_thresh parameter of the pipeline object's predict method. |
No |
sealDetBoxThresh |
number | null |
Please refer to the description of the seal_det_box_thresh parameter of the pipeline object's predict method. |
No |
sealDetUnclipRatio |
number | null |
Please refer to the description of the seal_det_unclip_ratio parameter of the pipeline object's predict method. |
No |
sealRecScoreThresh |
number | null |
Please refer to the description of the seal_rec_score_thresh parameter of the pipeline object's predict method. |
No |
result
in the response body has the following properties:Name | Type | Meaning |
---|---|---|
sealRecResults |
object |
The seal text recognition result. The array length is 1 (for image input) or the actual number of document pages processed (for PDF input). For PDF input, each element in the array represents the result of each page actually processed in the PDF file. |
dataInfo |
object |
Information about the input data. |
Each element in sealRecResults
is an object
with the following properties:
Name | Type | Meaning |
---|---|---|
prunedResult |
object |
A simplified version of the res field in the JSON representation generated by the predict method of the production object, where the input_path and the page_index fields are removed. |
outputImages |
object | null |
See the description of the img attribute of the result of the pipeline prediction. The images are in JPEG format and encoded in Base64. |
inputImage |
string | null |
The input image. The image is in JPEG format and encoded in Base64. |
import base64
import requests
API_URL = "http://localhost:8080/seal-recognition"
file_path = "./demo.jpg"
with open(file_path, "rb") as file:
file_bytes = file.read()
file_data = base64.b64encode(file_bytes).decode("ascii")
payload = {"file": file_data, "fileType": 1}
response = requests.post(API_URL, json=payload)
assert response.status_code == 200
result = response.json()["result"]
for i, res in enumerate(result["sealRecResults"]):
print(res["prunedResult"])
for img_name, img in res["outputImages"].items():
img_path = f"{img_name}_{i}.jpg"
with open(img_path, "wb") as f:
f.write(base64.b64decode(img))
print(f"Output image saved at {img_path}")
Scenario | Fine-Tuning Module | Fine-Tuning Reference Link |
---|---|---|
Inaccurate or missing seal position detection | Layout Detection Module | Link |
Missing text detection | Text Detection Module | Link |
Inaccurate text content | Text Recognition Module | Link |
Inaccurate full-image rotation correction | Document Image Orientation Classification Module | Link |
Inaccurate image distortion correction | Text Image Correction Module | Not supported for fine-tuning |