docs: Fix formatting (#14891)
* docs: Fix formatting * Fix typo * Fix translation * Fix formatting * Fix formattingpull/15006/head
parent
715b1d9aa4
commit
332d9d5112
docs
datasets
ppocr
blog
model_train
|
@ -5,7 +5,7 @@ comments: true
|
|||
|
||||
# OCR datasets
|
||||
|
||||
Here is a list of public datasets commonly used in OCR, which are being continuously updated. Welcome to contribute datasets~
|
||||
Here is a list of public datasets commonly used in OCR, which are being continuously updated. Welcome to contribute datasets!
|
||||
|
||||
## 1. Text detection
|
||||
|
||||
|
|
|
@ -34,10 +34,10 @@ Take rec_chinese_lite_train_v2.0.yml as an example
|
|||
| checkpoints | set model parameter path | None | Used to load parameters after interruption to continue training|
|
||||
| use_visualdl | Set whether to enable visualdl for visual log display | False | [Tutorial](https://www.paddlepaddle.org.cn/paddle/visualdl) |
|
||||
| use_wandb | Set whether to enable W&B for visual log display | False | [Documentation](https://docs.wandb.ai/)
|
||||
| infer_img | Set inference image path or folder path | ./infer_img | \||
|
||||
| infer_img | Set inference image path or folder path | ./infer_img | \ |
|
||||
| character_dict_path | Set dictionary path | ./ppocr/utils/ppocr_keys_v1.txt | If the character_dict_path is None, model can only recognize number and lower letters |
|
||||
| max_text_length | Set the maximum length of text | 25 | \ |
|
||||
| use_space_char | Set whether to recognize spaces | True | \| |
|
||||
| use_space_char | Set whether to recognize spaces | True | \ |
|
||||
| label_list | Set the angle supported by the direction classifier | ['0','180'] | Only valid in angle classifier model |
|
||||
| save_res_path | Set the save address of the test model results | ./output/det_db/predicts_db.txt | Only valid in the text detection model |
|
||||
|
||||
|
@ -50,10 +50,10 @@ Take rec_chinese_lite_train_v2.0.yml as an example
|
|||
| beta2 | Set the exponential decay rate for the 2nd moment estimates | 0.999 | \ |
|
||||
| clip_norm | The maximum norm value | - | \ |
|
||||
| **lr** | Set the learning rate decay method | - | \ |
|
||||
| name | Learning rate decay class name | Cosine | Currently supports`Linear`,`Cosine`,`Step`,`Piecewise`, see[ppocr/optimizer/learning_rate.py](../../ppocr/optimizer/learning_rate.py) |
|
||||
| name | Learning rate decay class name | Cosine | Currently supports`Linear`,`Cosine`,`Step`,`Piecewise`, see [ppocr/optimizer/learning_rate.py](../../ppocr/optimizer/learning_rate.py) |
|
||||
| learning_rate | Set the base learning rate | 0.001 | \ |
|
||||
| **regularizer** | Set network regularization method | - | \ |
|
||||
| name | Regularizer class name | L2 | Currently support`L1`,`L2`, see[ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py) |
|
||||
| name | Regularizer class name | L2 | Currently support`L1`,`L2`, see [ppocr/optimizer/regularizer.py](../../ppocr/optimizer/regularizer.py) |
|
||||
| factor | Regularizer coefficient | 0.00001 | \ |
|
||||
|
||||
### Architecture ([ppocr/modeling](../../ppocr/modeling))
|
||||
|
@ -73,12 +73,12 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
|
|||
| name | backbone class name | ResNet | Currently support`MobileNetV3`,`ResNet` |
|
||||
| layers | resnet layers | 34 | Currently support18,34,50,101,152,200 |
|
||||
| model_name | MobileNetV3 network size | small | Currently support`small`,`large` |
|
||||
| **Neck** | Set network neck | - | see[ppocr/modeling/necks](../../ppocr/modeling/necks) |
|
||||
| **Neck** | Set network neck | - | see [ppocr/modeling/necks](../../ppocr/modeling/necks) |
|
||||
| name | neck class name | SequenceEncoder | Currently support`SequenceEncoder`,`DBFPN` |
|
||||
| encoder_type | SequenceEncoder encoder type | rnn | Currently support`reshape`,`fc`,`rnn` |
|
||||
| hidden_size | rnn number of internal units | 48 | \ |
|
||||
| out_channels | Number of DBFPN output channels | 256 | \ |
|
||||
| **Head** | Set the network head | - | see[ppocr/modeling/heads](../../ppocr/modeling/heads) |
|
||||
| **Head** | Set the network head | - | see [ppocr/modeling/heads](../../ppocr/modeling/heads) |
|
||||
| name | head class name | CTCHead | Currently support`CTCHead`,`DBHead`,`ClsHead` |
|
||||
| fc_decay | CTCHead regularization coefficient | 0.0004 | \ |
|
||||
| k | DBHead binarization coefficient | 50 | \ |
|
||||
|
@ -121,7 +121,7 @@ In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck
|
|||
| data_dir | Image folder path | ./train_data | \ |
|
||||
| label_file_list | Groundtruth file path | ["./train_data/train_list.txt"] | This parameter is not required when dataset is LMDBDataSet |
|
||||
| ratio_list | Ratio of data set | [1.0] | If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset |
|
||||
| transforms | List of methods to transform images and labels | [DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys] | see[ppocr/data/imaug](../../ppocr/data/imaug) |
|
||||
| transforms | List of methods to transform images and labels | [DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys] | see [ppocr/data/imaug](../../ppocr/data/imaug) |
|
||||
| **loader** | dataloader related | - | |
|
||||
| shuffle | Does each epoch disrupt the order of the data set | True | \ |
|
||||
| batch_size_per_card | Single card batch size during training | 256 | \ |
|
||||
|
|
|
@ -10,7 +10,7 @@ This section uses the icdar2015 dataset as an example to introduce the training,
|
|||
|
||||
### 1.1 Data Preparation
|
||||
|
||||
To prepare datasets, refer to [ocr_datasets](../../datasets/ocr_datasets.en.md) .
|
||||
To prepare datasets, refer to [ocr_datasets](../../datasets/ocr_datasets.en.md).
|
||||
|
||||
### 1.2 Download Pre-trained Model
|
||||
|
||||
|
|
|
@ -55,7 +55,7 @@ Optimizer:
|
|||
|
||||
### 2.3 Evaluation Indicators
|
||||
|
||||
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
|
||||
(1) Detection stage: First, evaluate according to the IOU of the detection frame and the labeled frame. If the IOU is greater than a certain threshold, it is judged that the detection is accurate. Here, the detection frame and the label frame are different from the general target detection frame, and they are represented by polygons. Detection accuracy: the percentage of the correct detection frame number in all detection frames is mainly used to judge the detection index. Detection recall rate: the percentage of correct detection frames in all marked frames, which is mainly an indicator of missed detection.
|
||||
|
||||
(2) Recognition stage: Character recognition accuracy, that is, the ratio of correctly recognized text lines to the number of marked text lines. Only the entire line of text recognition pairs can be regarded as correct recognition.
|
||||
|
||||
|
|
|
@ -2,9 +2,9 @@
|
|||
comments: true
|
||||
---
|
||||
|
||||
Here we have sorted out some Chinese OCR training and prediction tricks, which are being updated continuously. You are welcome to contribute more OCR tricks ~
|
||||
Here we have sorted out some Chinese OCR training and prediction tricks, which are being updated continuously. You are welcome to contribute more OCR tricks!
|
||||
|
||||
#### 1、Replace Backbone Network
|
||||
#### 1. Replace Backbone Network
|
||||
|
||||
- **Problem Description**
|
||||
|
||||
|
@ -17,7 +17,7 @@ Here we have sorted out some Chinese OCR training and prediction tricks, which a
|
|||
|
||||
- In order to replace the backbone network of text recognition, we need to pay attention to the descending position of network width and height stride. Since the ratio between width and height is large in chinese text recognition, the frequency of height decrease is less and the frequency of width decrease is more. You can refer the [modifies of MobileNetV3](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/modeling/backbones/rec_mobilenet_v3.py) in PaddleOCR.
|
||||
|
||||
#### 2、Long Chinese Text Recognition
|
||||
#### 2. Long Chinese Text Recognition
|
||||
|
||||
- **Problem Description**
|
||||
The maximum resolution of Chinese recognition model during training is [3,32,320], if the text image to be recognized is too long, as shown in the figure below, how to adapt?
|
||||
|
@ -50,7 +50,7 @@ Here we have sorted out some Chinese OCR training and prediction tricks, which a
|
|||
return padding_im
|
||||
```
|
||||
|
||||
#### 3、Space Recognition
|
||||
#### 3. Space Recognition
|
||||
|
||||
- **Problem Description**
|
||||
|
||||
|
|
Loading…
Reference in New Issue