add `slice` op demo for quickstart (#12439)

pull/12520/head
Wang Xin 2024-05-25 23:26:28 +08:00 committed by GitHub
parent c3648211ea
commit 739400f151
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 101 additions and 0 deletions

View File

@ -253,6 +253,46 @@ for idx in range(len(result)):
im_show.save('result_page_{}.jpg'.format(idx))
```
* 使用滑动窗口进行检测和识别
要使用滑动窗口进行光学字符识别OCR可以使用以下代码片段
```Python
from paddleocr import PaddleOCR
from PIL import Image, ImageDraw, ImageFont
# 初始化OCR引擎
ocr = PaddleOCR(use_angle_cls=True, lang="en")
img_path = "./very_large_image.jpg"
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
results = ocr.ocr(img_path, cls=True, slice=slice)
# 加载图像
image = Image.open(img_path).convert("RGB")
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20) # 根据需要调整大小
# 处理并绘制结果
for res in results:
for line in res:
box = [tuple(point) for point in line[0]] # 将列表转换为元组列表
# 将四个角转换为两个角
box = [(min(point[0] for point in box), min(point[1] for point in box)),
(max(point[0] for point in box), max(point[1] for point in box))]
txt = line[1][0]
draw.rectangle(box, outline="red", width=2) # 绘制矩形
draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font) # 在矩形上方绘制文本
# 保存结果
image.save("result.jpg")
```
此示例初始化了启用角度分类的PaddleOCR实例并将语言设置为英语。然后调用`ocr`方法,并使用多个参数来自定义检测和识别过程,包括处理图像切片的`slice`参数。
要更全面地了解切片操作,请参考[切片操作文档](./slice.md)。
## 3. 小结
通过本节内容相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。

View File

@ -0,0 +1,21 @@
# 切片操作
如果希望运行 PaddleOCR 处理一张非常大的图像或文档,对其进行检测和识别,可以使用切片操作,如下所示:
```python
ocr_inst = PaddleOCR(**ocr_settings)
results = ocr_inst.ocr(img, det=True, rec=True, slice=slice, cls=False, bin=False, inv=False, alpha_color=False)
```
其中,
`slice = {'horizontal_stride': h_stride, 'vertical_stride': v_stride, 'merge_x_thres': x_thres, 'merge_y_thres': y_thres}`
这里的 `h_stride`、`v_stride`、`x_thres` 和 `y_thres` 是用户可配置的参数,需要手动设置。切片操作符的工作原理是,在大图像上运行一个滑动窗口,创建图像的切片,并在这些切片上运行 OCR 算法。
然后将这些切片级别的零散结果合并,生成图像级别的检测和识别结果。水平和垂直步幅不能低于一定限度,因为过低的值会产生太多切片,导致计算结果非常耗时。例如,对于尺寸为 6616x14886 的图像,推荐使用以下参数:
```python
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
```
所有边界框接近 `merge_x_thres``merge_y_thres` 的切片级检测结果将被合并在一起。

View File

@ -266,6 +266,46 @@ for idx in range(len(result)):
im_show.save('result_page_{}.jpg'.format(idx))
```
* Detection and Recognition Using Sliding Windows
To perform OCR using sliding windows, the following code snippet can be employed:
```Python
from paddleocr import PaddleOCR
from PIL import Image, ImageDraw, ImageFont
# Initialize OCR engine
ocr = PaddleOCR(use_angle_cls=True, lang="en")
img_path = "./very_large_image.jpg"
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
results = ocr.ocr(img_path, cls=True, slice=slice)
# Load image
image = Image.open(img_path).convert("RGB")
draw = ImageDraw.Draw(image)
font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20) # Adjust size as needed
# Process and draw results
for res in results:
for line in res:
box = [tuple(point) for point in line[0]] # Convert list of lists to list of tuples
# Convert four corners to two corners
box = [(min(point[0] for point in box), min(point[1] for point in box)),
(max(point[0] for point in box), max(point[1] for point in box))]
txt = line[1][0]
draw.rectangle(box, outline="red", width=2) # Draw rectangle
draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font) # Draw text above the box
# Save result
image.save("result.jpg")
```
This example initializes the PaddleOCR instance with angle classification enabled and sets the language to English. The `ocr` method is then called with several parameters to customize the detection and recognition process, including the `slice` parameter for handling image slices.
For a more comprehensive understanding of the slicing operation, please refer to the [slice operation documentation](./slice_en.md).
<a name="3"></a>
## 3. Summary