add pse curved text detection doc

2025-06-03 21:53:39 +08:00 · 2022-04-26 22:53:40 +08:00 · 2022-04-26 22:53:40 +08:00 · d8b33ba187
commit d8b33ba187
parent 16bacf0300
4 changed files with 28 additions and 8 deletions
--- a/doc/doc_ch/algorithm_det_psenet.md
+++ b/doc/doc_ch/algorithm_det_psenet.md
@ -52,17 +52,27 @@
 python3 tools/export_model.py -c configs/det/det_r50_vd_pse.yml -o Global.pretrained_model=./det_r50_vd_pse_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_pse
 ```

-PSE文本检测模型推理，可以执行如下命令：
+PSE文本检测模型推理，执行非弯曲文本检测，可以执行如下命令：

 ```shell
-python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE"
+python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=quad
 ```

 可视化文本检测结果默认保存到`./inference_results`文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：

 ![](../imgs_results/det_res_img_10_pse.jpg)

-**注意**：由于ICDAR2015数据集只有1000张训练图像，且主要针对英文场景，所以上述模型对中文文本图像检测效果会比较差。
+如果想执行弯曲文本检测，可以执行如下命令：
+
+```shell
+python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=poly
+```
+
+可视化文本检测结果默认保存到`./inference_results`文件夹里面，结果文件的名称前缀为'det_res'。结果示例如下：
+
+![](../imgs_results/det_res_img_10_pse_poly.jpg)
+
+**注意**：由于ICDAR2015数据集只有1000张训练图像，且主要针对英文场景，所以上述模型对中文或弯曲文本图像检测效果会比较差。

 <a name="4-2"></a>
 ### 4.2 C++推理
--- a/doc/doc_en/algorithm_det_psenet_en.md
+++ b/doc/doc_en/algorithm_det_psenet_en.md
@ -52,17 +52,27 @@ First, convert the model saved in the PSE text detection training process into a
 python3 tools/export_model.py -c configs/det/det_r50_vd_pse.yml -o Global.pretrained_model=./det_r50_vd_pse_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_pse
 ```

-PSE text detection model inference, you can execute the following command:
+PSE text detection model inference, to perform non-curved text detection, you can run the following commands:

 ```shell
-python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE"
+python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=quad
 ```

 The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:

 ![](../imgs_results/det_res_img_10_pse.jpg)

-**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese text images.
+If you want to perform curved text detection, you can execute the following command:
+
+```shell
+python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=poly
+```
+
+The visualized text detection results are saved to the `./inference_results` folder by default, and the name of the result file is prefixed with 'det_res'. Examples of results are as follows:
+
+![](../imgs_results/det_res_img_10_pse_poly.jpg)
+
+**Note**: Since the ICDAR2015 dataset has only 1,000 training images, mainly for English scenes, the above model has very poor detection result on Chinese or curved text images.


 <a name="4-2"></a>
--- a/doc/imgs_results/det_res_img_10_pse_poly.jpg
+++ b/doc/imgs_results/det_res_img_10_pse_poly.jpg
--- a/tools/infer/predict_det.py
+++ b/tools/infer/predict_det.py
@ -158,7 +158,7 @@ class TextDetector(object):
        rect[1] = pts[np.argmin(diff)]
        rect[3] = pts[np.argmax(diff)]
        return rect
-    
+
    def clip_det_res(self, points, img_height, img_width):
        for pno in range(points.shape[0]):
            points[pno, 0] = int(min(max(points[pno, 0], 0), img_width - 1))
@ -284,7 +284,7 @@ if __name__ == "__main__":
            total_time += elapse
        count += 1
        save_pred = os.path.basename(image_file) + "\t" + str(
-            json.dumps(np.array(dt_boxes).astype(np.int32).tolist())) + "\n"
+            json.dumps([x.tolist() for x in dt_boxes])) + "\n"
        save_results.append(save_pred)
        logger.info(save_pred)
        logger.info("The predict time of {}: {}".format(image_file, elapse))