diff --git a/doc/doc_ch/table_recognition.md b/doc/doc_ch/table_recognition.md index 156ba80e37..f09dedd038 100644 --- a/doc/doc_ch/table_recognition.md +++ b/doc/doc_ch/table_recognition.md @@ -14,6 +14,9 @@ - [2.5. 分布式训练](#25-分布式训练) - [2.6. 其他训练环境](#26-其他训练环境) - [2.7. 模型微调](#27-模型微调) + - [2.7.1 数据选择](#271-数据选择) + - [2.7.2 模型选择](#272-模型选择) + - [2.7.3 训练超参选择](#273-训练超参选择) - [3. 模型评估与预测](#3-模型评估与预测) - [3.1. 指标评估](#31-指标评估) - [3.2. 测试表格结构识别效果](#32-测试表格结构识别效果) @@ -219,7 +222,39 @@ DCU设备上运行需要设置环境变量 `export HIP_VISIBLE_DEVICES=0,1,2,3` ## 2.7. 模型微调 -实际使用过程中,建议加载官方提供的预训练模型,在自己的数据集中进行微调,关于模型的微调方法,请参考:[模型微调教程](./finetune.md)。 +### 2.7.1 数据选择 + +数据量:建议至少准备2000张的表格识别数据集用于模型微调。 + +### 2.7.2 模型选择 + +建议选择SLANet模型(配置文件:[SLANet_ch.yml](../../configs/table/SLANet_ch.yml),预训练模型:[ch_ppstructure_mobile_v2.0_SLANet_train.tar](https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar))进行微调,其精度与泛化性能是目前提供的最优中文表格预训练模型。 + +更多表格识别模型,请参考[PP-Structure 系列模型库](../../ppstructure/docs/models_list.md)。 + +### 2.7.3 训练超参选择 + +在模型微调的时候,最重要的超参就是预训练模型路径`pretrained_model`, 学习率`learning_rate`,部分配置文件如下所示。 + +```yaml +Global: + pretrained_model: ./ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy.pdparams # 预训练模型路径 +Optimizer: + lr: + name: Cosine + learning_rate: 0.001 # + warmup_epoch: 0 + regularizer: + name: 'L2' + factor: 0 +``` + +上述配置文件中,首先需要将`pretrained_model`字段指定为`best_accuracy.pdparams`文件路径。 + +PaddleOCR提供的配置文件是在4卡训练(相当于总的batch size是`4*48=192`)、且没有加载预训练模型情况下的配置文件,因此您的场景中,学习率与总的batch size需要对应线性调整,例如 + +* 如果您的场景中是单卡训练,单卡batch_size=48,则总的batch_size=48,建议将学习率调整为`0.00025`左右。 +* 如果您的场景中是单卡训练,由于显存限制,只能设置单卡batch_size=32,则总的batch_size=32,建议将学习率调整为`0.00017`左右。 # 3. 模型评估与预测 diff --git a/doc/doc_en/table_recognition_en.md b/doc/doc_en/table_recognition_en.md index cff2933df2..d79d98936e 100644 --- a/doc/doc_en/table_recognition_en.md +++ b/doc/doc_en/table_recognition_en.md @@ -14,6 +14,9 @@ This article provides a full-process guide for the PaddleOCR table recognition m - [2.5. Distributed Training](#25-distributed-training) - [2.6. Training on other platform(Windows/macOS/Linux DCU)](#26-training-on-other-platformwindowsmacoslinux-dcu) - [2.7. Fine-tuning](#27-fine-tuning) + - [2.7.1 Dataset](#271-dataset) + - [2.7.2 model selection](#272-model-selection) + - [2.7.3 Training hyperparameter selection](#273-training-hyperparameter-selection) - [3. Evaluation and Test](#3-evaluation-and-test) - [3.1. Evaluation](#31-evaluation) - [3.2. Test table structure recognition effect](#32-test-table-structure-recognition-effect) @@ -226,8 +229,40 @@ Running on a DCU device requires setting the environment variable `export HIP_VI ## 2.7. Fine-tuning -In the actual use process, it is recommended to load the officially provided pre-training model and fine-tune it in your own data set. For the fine-tuning method of the table recognition model, please refer to: [Model fine-tuning tutorial](./finetune.md). +### 2.7.1 Dataset + +Data number: It is recommended to prepare at least 2000 table recognition datasets for model fine-tuning. + +### 2.7.2 model selection + +It is recommended to choose the SLANet model (configuration file: [SLANet_ch.yml](../../configs/table/SLANet_ch.yml), pre-training model: [ch_ppstructure_mobile_v2.0_SLANet_train.tar](https://paddleocr.bj.bcebos .com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_train.tar)) for fine-tuning, its accuracy and generalization performance is the best Chinese table pre-training model currently available. + +For more table recognition models, please refer to [PP-Structure Series Model Library](../../ppstructure/docs/models_list.md). + +### 2.7.3 Training hyperparameter selection + +When fine-tuning the model, the most important hyperparameters are the pretrained model path `pretrained_model`, the learning rate `learning_rate`, and some configuration files are shown below. + +```yaml +Global: + pretrained_model: ./ch_ppstructure_mobile_v2.0_SLANet_train/best_accuracy.pdparams # Pre-trained model path +Optimizer: + lr: + name: Cosine + learning_rate: 0.001 # + warmup_epoch: 0 + regularizer: + name: 'L2' + factor: 0 +``` + +In the above configuration file, you first need to specify the `pretrained_model` field as the `best_accuracy.pdparams` file path. + +The configuration file provided by PaddleOCR is for 4-card training (equivalent to a total batch size of `4*48=192`) and no pre-trained model is loaded. Therefore, in your scenario, the learning rate is the same as the total The batch size needs to be adjusted linearly, for example + +* If your scenario is single card training, single card batch_size=48, then the total batch_size=48, it is recommended to adjust the learning rate to about `0.00025`. +* If your scenario is for single-card training, due to memory limitations, you can only set batch_size=32 for a single card, then the total batch_size=32, it is recommended to adjust the learning rate to about `0.00017`. # 3. Evaluation and Test