update finetune doc (#8774)

* update finetune doc

* update finetune doc
pull/6666/head^2
xiaoting 2023-01-04 18:41:16 +08:00 committed by GitHub
parent c1e19914bd
commit 97ef80e333
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 125 additions and 0 deletions

View File

@ -103,6 +103,66 @@ PaddleOCR提供的配置文件是在8卡训练相当于总的batch size是`8*
更多PP-OCR系列模型请参考[PP-OCR 系列模型库](./models_list.md)。
PP-OCRv3 模型使用了GTC策略其中SAR分支参数量大当训练数据为简单场景时模型容易过拟合导致微调效果不佳建议去除GTC策略模型结构部分配置文件修改如下
```yaml
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Neck:
name: SequenceEncoder
encoder_type: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: False
Head:
name: CTCHead
fc_decay: 0.00001
Loss:
name: CTCLoss
Train:
dataset:
......
transforms:
# 去除 RecConAug 增广
# - RecConAug:
# prob: 0.5
# ext_data_num: 2
# image_shape: [48, 320, 3]
# max_text_length: *max_text_length
- RecAug:
# 修改 Encode 方式
- CTCLabelEncode:
- KeepKeys:
keep_keys:
- image
- label
- length
...
Eval:
dataset:
...
transforms:
...
- CTCLabelEncode:
- KeepKeys:
keep_keys:
- image
- label
- length
...
```
### 3.3 训练超参选择
@ -163,6 +223,9 @@ Train:
ratio_list: [1.0, 0.1]
```
### 3.4 训练调优
训练过程并非一蹴而就的,完成一个阶段的训练评估后,建议收集分析当前模型在真实场景中的 badcase有针对性的调整训练数据比例或者进一步新增合成数据。通过多次迭代训练不断优化模型效果。
如果在训练时修改了自定义字典由于无法加载最后一层FC的参数在迭代初期acc=0是正常的情况不必担心加载预训练模型依然可以加快模型收敛。

View File

@ -103,6 +103,66 @@ It is recommended to choose the PP-OCRv3 model (configuration file: [ch_PP-OCRv3
For more PP-OCR series models, please refer to [PP-OCR Series Model Library](./models_list_en.md)。
The PP-OCRv3 model uses the GTC strategy. The SAR branch has a large number of parameters. When the training data is a simple scene, the model is easy to overfit, resulting in poor fine-tuning effect. It is recommended to remove the GTC strategy. The configuration file of the model structure is modified as follows:
```yaml
Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Neck:
name: SequenceEncoder
encoder_type: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: False
Head:
name: CTCHead
fc_decay: 0.00001
Loss:
name: CTCLoss
Train:
dataset:
......
transforms:
# remove RecConAug
# - RecConAug:
# prob: 0.5
# ext_data_num: 2
# image_shape: [48, 320, 3]
# max_text_length: *max_text_length
- RecAug:
# modify Encode
- CTCLabelEncode:
- KeepKeys:
keep_keys:
- image
- label
- length
...
Eval:
dataset:
...
transforms:
...
- CTCLabelEncode:
- KeepKeys:
keep_keys:
- image
- label
- length
...
```
### 3.3 Training hyperparameter
@ -165,3 +225,5 @@ Train:
### 3.4 training optimization
The training process does not happen overnight. After completing a stage of training evaluation, it is recommended to collect and analyze the badcase of the current model in the real scene, adjust the proportion of training data in a targeted manner, or further add synthetic data. Through multiple iterations of training, the model effect is continuously optimized.
If you modify the custom dictionary during training, since the parameters of the last layer of FC cannot be loaded, it is normal for acc=0 at the beginning of the iteration. Don't worry, loading the pre-trained model can still speed up the model convergence.