add `slice` op demo for quickstart (#12439)
parent
c3648211ea
commit
739400f151
|
@ -253,6 +253,46 @@ for idx in range(len(result)):
|
|||
im_show.save('result_page_{}.jpg'.format(idx))
|
||||
```
|
||||
|
||||
* 使用滑动窗口进行检测和识别
|
||||
|
||||
要使用滑动窗口进行光学字符识别(OCR),可以使用以下代码片段:
|
||||
|
||||
```Python
|
||||
from paddleocr import PaddleOCR
|
||||
from PIL import Image, ImageDraw, ImageFont
|
||||
|
||||
# 初始化OCR引擎
|
||||
ocr = PaddleOCR(use_angle_cls=True, lang="en")
|
||||
|
||||
img_path = "./very_large_image.jpg"
|
||||
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
|
||||
results = ocr.ocr(img_path, cls=True, slice=slice)
|
||||
|
||||
# 加载图像
|
||||
image = Image.open(img_path).convert("RGB")
|
||||
draw = ImageDraw.Draw(image)
|
||||
font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20) # 根据需要调整大小
|
||||
|
||||
# 处理并绘制结果
|
||||
for res in results:
|
||||
for line in res:
|
||||
box = [tuple(point) for point in line[0]] # 将列表转换为元组列表
|
||||
# 将四个角转换为两个角
|
||||
box = [(min(point[0] for point in box), min(point[1] for point in box)),
|
||||
(max(point[0] for point in box), max(point[1] for point in box))]
|
||||
txt = line[1][0]
|
||||
draw.rectangle(box, outline="red", width=2) # 绘制矩形
|
||||
draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font) # 在矩形上方绘制文本
|
||||
|
||||
# 保存结果
|
||||
image.save("result.jpg")
|
||||
|
||||
```
|
||||
|
||||
此示例初始化了启用角度分类的PaddleOCR实例,并将语言设置为英语。然后调用`ocr`方法,并使用多个参数来自定义检测和识别过程,包括处理图像切片的`slice`参数。
|
||||
|
||||
要更全面地了解切片操作,请参考[切片操作文档](./slice.md)。
|
||||
|
||||
## 3. 小结
|
||||
|
||||
通过本节内容,相信您已经熟练掌握PaddleOCR whl包的使用方法并获得了初步效果。
|
||||
|
|
|
@ -0,0 +1,21 @@
|
|||
# 切片操作
|
||||
|
||||
如果希望运行 PaddleOCR 处理一张非常大的图像或文档,对其进行检测和识别,可以使用切片操作,如下所示:
|
||||
|
||||
```python
|
||||
ocr_inst = PaddleOCR(**ocr_settings)
|
||||
results = ocr_inst.ocr(img, det=True, rec=True, slice=slice, cls=False, bin=False, inv=False, alpha_color=False)
|
||||
```
|
||||
|
||||
其中,
|
||||
`slice = {'horizontal_stride': h_stride, 'vertical_stride': v_stride, 'merge_x_thres': x_thres, 'merge_y_thres': y_thres}`
|
||||
|
||||
这里的 `h_stride`、`v_stride`、`x_thres` 和 `y_thres` 是用户可配置的参数,需要手动设置。切片操作符的工作原理是,在大图像上运行一个滑动窗口,创建图像的切片,并在这些切片上运行 OCR 算法。
|
||||
|
||||
然后将这些切片级别的零散结果合并,生成图像级别的检测和识别结果。水平和垂直步幅不能低于一定限度,因为过低的值会产生太多切片,导致计算结果非常耗时。例如,对于尺寸为 6616x14886 的图像,推荐使用以下参数:
|
||||
|
||||
```python
|
||||
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
|
||||
```
|
||||
|
||||
所有边界框接近 `merge_x_thres` 和 `merge_y_thres` 的切片级检测结果将被合并在一起。
|
|
@ -266,6 +266,46 @@ for idx in range(len(result)):
|
|||
im_show.save('result_page_{}.jpg'.format(idx))
|
||||
```
|
||||
|
||||
* Detection and Recognition Using Sliding Windows
|
||||
|
||||
To perform OCR using sliding windows, the following code snippet can be employed:
|
||||
|
||||
```Python
|
||||
from paddleocr import PaddleOCR
|
||||
from PIL import Image, ImageDraw, ImageFont
|
||||
|
||||
# Initialize OCR engine
|
||||
ocr = PaddleOCR(use_angle_cls=True, lang="en")
|
||||
|
||||
img_path = "./very_large_image.jpg"
|
||||
slice = {'horizontal_stride': 300, 'vertical_stride': 500, 'merge_x_thres': 50, 'merge_y_thres': 35}
|
||||
results = ocr.ocr(img_path, cls=True, slice=slice)
|
||||
|
||||
# Load image
|
||||
image = Image.open(img_path).convert("RGB")
|
||||
draw = ImageDraw.Draw(image)
|
||||
font = ImageFont.truetype("./doc/fonts/simfang.ttf", size=20) # Adjust size as needed
|
||||
|
||||
# Process and draw results
|
||||
for res in results:
|
||||
for line in res:
|
||||
box = [tuple(point) for point in line[0]] # Convert list of lists to list of tuples
|
||||
# Convert four corners to two corners
|
||||
box = [(min(point[0] for point in box), min(point[1] for point in box)),
|
||||
(max(point[0] for point in box), max(point[1] for point in box))]
|
||||
txt = line[1][0]
|
||||
draw.rectangle(box, outline="red", width=2) # Draw rectangle
|
||||
draw.text((box[0][0], box[0][1] - 25), txt, fill="blue", font=font) # Draw text above the box
|
||||
|
||||
# Save result
|
||||
image.save("result.jpg")
|
||||
|
||||
```
|
||||
|
||||
This example initializes the PaddleOCR instance with angle classification enabled and sets the language to English. The `ocr` method is then called with several parameters to customize the detection and recognition process, including the `slice` parameter for handling image slices.
|
||||
|
||||
For a more comprehensive understanding of the slicing operation, please refer to the [slice operation documentation](./slice_en.md).
|
||||
|
||||
<a name="3"></a>
|
||||
|
||||
## 3. Summary
|
||||
|
|
Loading…
Reference in New Issue