move StyleText to PFCCLab/StyleText (#12121)

pull/12142/head
Wang Xin 2024-05-15 14:12:23 +08:00 committed by GitHub
parent a4b7d3ba4a
commit 3e5934de62
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
57 changed files with 15 additions and 2840 deletions

View File

@ -159,7 +159,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力
- [场景应用](./applications)
- 数据标注与合成
- [半自动标注工具PPOCRLabel](https://github.com/PFCCLab/PPOCRLabel/blob/main/README_ch.md)
- [数据合成工具Style-Text](./StyleText/README_ch.md)
- [数据合成工具Style-Text](https://github.com/PFCCLab/StyleText/blob/main/README_ch.md)
- [其它数据标注工具](./doc/doc_ch/data_annotation.md)
- [其它数据合成工具](./doc/doc_ch/data_synthesis.md)
- 数据集

View File

@ -154,7 +154,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库助力
- [场景应用](./applications)
- 数据标注与合成
- [半自动标注工具PPOCRLabel](https://github.com/PFCCLab/PPOCRLabel/blob/main/README_ch.md)
- [数据合成工具Style-Text](./StyleText/README_ch.md)
- [数据合成工具Style-Text](https://github.com/PFCCLab/StyleText/blob/main/README_ch.md)
- [其它数据标注工具](./doc/doc_ch/data_annotation.md)
- [其它数据合成工具](./doc/doc_ch/data_synthesis.md)
- 数据集

View File

@ -164,7 +164,7 @@ Scan the QR code below on WeChat to add operation students, and reply [paddlex],
- [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md)
- Data Annotation and Synthesis
- [Semi-automatic Annotation Tool: PPOCRLabel](https://github.com/PFCCLab/PPOCRLabel/blob/main/README.md)
- [Data Synthesis Tool: Style-Text](./StyleText/README.md)
- [Data Synthesis Tool: Style-Text](https://github.com/PFCCLab/StyleText/blob/main/README.md)
- [Other Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
- [Other Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
- Datasets

View File

@ -1,219 +0,0 @@
English | [简体中文](README_ch.md)
## Style Text
### Contents
- [1. Introduction](#Introduction)
- [2. Preparation](#Preparation)
- [3. Quick Start](#Quick_Start)
- [4. Applications](#Applications)
- [5. Code Structure](#Code_structure)
<a name="Introduction"></a>
### Introduction
<div align="center">
<img src="doc/images/3.png" width="800">
</div>
<div align="center">
<img src="doc/images/9.png" width="600">
</div>
The Style-Text data synthesis tool is a tool based on Baidu and HUST cooperation research work, "Editing Text in the Wild" [https://arxiv.org/abs/1908.03047](https://arxiv.org/abs/1908.03047).
Different from the commonly used GAN-based data synthesis tools, the main framework of Style-Text includes:
* (1) Text foreground style transfer module.
* (2) Background extraction module.
* (3) Fusion module.
After these three steps, you can quickly realize the image text style transfer. The following figure is some results of the data synthesis tool.
<div align="center">
<img src="doc/images/10.png" width="1000">
</div>
<a name="Preparation"></a>
#### Preparation
1. Please refer the [QUICK INSTALLATION](../doc/doc_en/installation_en.md) to install PaddlePaddle. Python3 environment is strongly recommended.
2. Download the pretrained models and unzip:
```bash
cd StyleText
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
unzip style_text_models.zip
```
If you save the model in another location, please modify the address of the model file in `configs/config.yml`, and you need to modify these three configurations at the same time:
```
bg_generator:
pretrain: style_text_models/bg_generator
...
text_generator:
pretrain: style_text_models/text_generator
...
fusion_generator:
pretrain: style_text_models/fusion_generator
```
<a name="Quick_Start"></a>
### Quick Start
#### Synthesis single image
1. You can run `tools/synth_image` and generate the demo image, which is saved in the current folder.
```python
python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
```
* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English(en), Simplified Chinese(ch) and Korean(ko).
* Note 2: Synth-Text is mainly used to generate images for OCR recognition models.
So the height of style images should be around 32 pixels. Images in other sizes may behave poorly.
* Note 3: You can modify `use_gpu` in `configs/config.yml` to determine whether to use GPU for prediction.
For example, enter the following image and corpus `PaddleOCR`.
<div align="center">
<img src="examples/style_images/2.jpg" width="300">
</div>
The result `fake_fusion.jpg` will be generated.
<div align="center">
<img src="doc/images/4.jpg" width="300">
</div>
What's more, the medium result `fake_bg.jpg` will also be saved, which is the background output.
<div align="center">
<img src="doc/images/7.jpg" width="300">
</div>
`fake_text.jpg` * `fake_text.jpg` is the generated image with the same font style as `Style Input`.
<div align="center">
<img src="doc/images/8.jpg" width="300">
</div>
#### Batch synthesis
In actual application scenarios, it is often necessary to synthesize pictures in batches and add them to the training set. StyleText can use a batch of style pictures and corpus to synthesize data in batches. The synthesis process is as follows:
1. The referenced dataset can be specifed in `configs/dataset_config.yml`:
* `Global`
* `output_dir:`Output synthesis data path.
* `StyleSampler`
* `image_home`style images' folder.
* `label_file`Style images' file list. If label is provided, then it is the label file path.
* `with_label`Whether the `label_file` is label file list.
* `CorpusGenerator`
* `method`Method of CorpusGeneratorsupports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is usedNo other configuration is neededotherwise you need to set `corpus_file` and `language`.
* `language`Language of the corpus. Currently, the tool only supports English(en), Simplified Chinese(ch) and Korean(ko).
* `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings'\n'. Corpus generator samples one line each time.
Example of corpus file:
```
PaddleOCR
飞桨文字识别
StyleText
风格文本图像数据合成
```
We provide a general dataset containing Chinese, English and Korean (50,000 images in all) for your trial ([download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)), some examples are given below :
<div align="center">
<img src="doc/images/5.png" width="800">
</div>
2. You can run the following command to start synthesis task:
``` bash
python3 tools/synth_dataset.py -c configs/dataset_config.yml
```
We also provide example corpus and images in `examples` folder.
<div align="center">
<img src="examples/style_images/1.jpg" width="300">
<img src="examples/style_images/2.jpg" width="300">
</div>
If you run the code above directly, you will get example output data in `output_data` folder.
You will get synthesis images and labels as below:
<div align="center">
<img src="doc/images/12.png" width="800">
</div>
There will be some cache under the `label` folder. If the program exit unexpectedly, you can find cached labels there.
When the program finish normally, you will find all the labels in `label.txt` which give the final results.
<a name="Applications"></a>
### Applications
We take two scenes as examples, which are metal surface English number recognition and general Korean recognition, to illustrate practical cases of using StyleText to synthesize data to improve text recognition. The following figure shows some examples of real scene images and composite images:
<div align="center">
<img src="doc/images/11.png" width="800">
</div>
After adding the above synthetic data for training, the accuracy of the recognition model is improved, which is shown in the following table:
| Scenario | Characters | Raw Data | Test Data | Only Use Raw Data</br>Recognition Accuracy | New Synthetic Data | Simultaneous Use of Synthetic Data</br>Recognition Accuracy | Index Improvement |
| -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- |
| Metal surface | English and numbers | 2203 | 650 | 59.38% | 20000 | 75.46% | 16.08% |
| Random background | Korean | 5631 | 1230 | 30.12% | 100000 | 50.57% | 20.45% |
<a name="Code_structure"></a>
### Code Structure
```
StyleText
|-- arch // Network module files.
| |-- base_module.py
| |-- decoder.py
| |-- encoder.py
| |-- spectral_norm.py
| `-- style_text_rec.py
|-- configs // Config files.
| |-- config.yml
| `-- dataset_config.yml
|-- engine // Synthesis engines.
| |-- corpus_generators.py // Sample corpus from file or generate random corpus.
| |-- predictors.py // Predict using network.
| |-- style_samplers.py // Sample style images.
| |-- synthesisers.py // Manage other engines to synthesis images.
| |-- text_drawers.py // Generate standard input text images.
| `-- writers.py // Write synthesis images and labels into files.
|-- examples // Example files.
| |-- corpus
| | `-- example.txt
| |-- image_list.txt
| `-- style_images
| |-- 1.jpg
| `-- 2.jpg
|-- fonts // Font files.
| |-- ch_standard.ttf
| |-- en_standard.ttf
| `-- ko_standard.ttf
|-- tools // Program entrance.
| |-- __init__.py
| |-- synth_dataset.py // Synthesis dataset.
| `-- synth_image.py // Synthesis image.
`-- utils // Module of basic functions.
|-- config.py
|-- load_params.py
|-- logging.py
|-- math_functions.py
`-- sys_funcs.py
```

View File

@ -1,205 +0,0 @@
简体中文 | [English](README.md)
## Style Text
### 目录
- [一、工具简介](#工具简介)
- [二、环境配置](#环境配置)
- [三、快速上手](#快速上手)
- [四、应用案例](#应用案例)
- [五、代码结构](#代码结构)
<a name="工具简介"></a>
### 一、工具简介
<div align="center">
<img src="doc/images/3.png" width="800">
</div>
<div align="center">
<img src="doc/images/1.png" width="600">
</div>
Style-Text数据合成工具是基于百度和华科合作研发的文本编辑算法《Editing Text in the Wild》https://arxiv.org/abs/1908.03047
不同于常用的基于GAN的数据合成工具Style-Text主要框架包括1.文本前景风格迁移模块 2.背景抽取模块 3.融合模块。经过这样三步,就可以迅速实现图像文本风格迁移。下图是一些该数据合成工具效果图。
<div align="center">
<img src="doc/images/2.png" width="1000">
</div>
<a name="环境配置"></a>
### 二、环境配置
1. 参考[快速安装](../doc/doc_ch/installation.md)安装PaddleOCR。
2. 进入`StyleText`目录,下载模型,并解压:
```bash
cd StyleText
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/style_text_models.zip
unzip style_text_models.zip
```
如果您将模型保存再其他位置,请在`configs/config.yml`中修改模型文件的地址,修改时需要同时修改这三个配置:
```
bg_generator:
pretrain: style_text_models/bg_generator
...
text_generator:
pretrain: style_text_models/text_generator
...
fusion_generator:
pretrain: style_text_models/fusion_generator
```
<a name="快速上手"></a>
### 三、快速上手
#### 合成单张图
输入一张风格图和一段文字语料运行tools/synth_image合成单张图片结果图像保存在当前目录下
```python
python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
```
* 注1语言选项和语料相对应目前支持英文(en)、简体中文(ch)和韩语(ko)。
* 注2Style-Text生成的数据主要应用于OCR识别场景。基于当前PaddleOCR识别模型的设计我们主要支持高度在32左右的风格图像。
如果输入图像尺寸相差过多,效果可能不佳。
* 注3可以通过修改配置文件`configs/config.yml`中的`use_gpu`(true或者false)参数来决定是否使用GPU进行预测。
例如,输入如下图片和语料"PaddleOCR":
<div align="center">
<img src="examples/style_images/2.jpg" width="300">
</div>
生成合成数据`fake_fusion.jpg`
<div align="center">
<img src="doc/images/4.jpg" width="300">
</div>
除此之外,程序还会生成并保存中间结果`fake_bg.jpg`:为风格参考图去掉文字后的背景;
<div align="center">
<img src="doc/images/7.jpg" width="300">
</div>
`fake_text.jpg`:是用提供的字符串,仿照风格参考图中文字的风格,生成在灰色背景上的文字图片。
<div align="center">
<img src="doc/images/8.jpg" width="300">
</div>
#### 批量合成
在实际应用场景中经常需要批量合成图片补充到训练集中。Style-Text可以使用一批风格图片和语料批量合成数据。合成过程如下
1. 在`configs/dataset_config.yml`中配置目标场景风格图像和语料的路径,具体如下:
* `Global`
* `output_dir:`:保存合成数据的目录。
* `StyleSampler`
* `image_home`:风格图片目录;
* `label_file`风格图片路径列表文件如果所用数据集有label则label_file为label文件路径
* `with_label`:标志`label_file`是否为label文件。
* `CorpusGenerator`
* `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`
* `language`:语料的语种,目前支持英文(en)、简体中文(ch)和韩语(ko)
* `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分,之后每次随机选取一行。
语料文件格式示例:
```
PaddleOCR
飞桨文字识别
StyleText
风格文本图像数据合成
...
```
Style-Text也提供了一批中英韩5万张通用场景数据用作文本风格图像便于合成场景丰富的文本图像下图给出了一些示例。
中英韩5万张通用场景数据: [下载地址](https://paddleocr.bj.bcebos.com/dygraph_v2.0/style_text/chkoen_5w.tar)
<div align="center">
<img src="doc/images/5.png" width="800">
</div>
2. 运行`tools/synth_dataset`合成数据:
``` bash
python3 tools/synth_dataset.py -c configs/dataset_config.yml
```
我们在examples目录下提供了样例图片和语料。
<div align="center">
<img src="examples/style_images/1.jpg" width="300">
<img src="examples/style_images/2.jpg" width="300">
</div>
直接运行上述命令可以在output_data中产生样例输出包括图片和用于训练识别模型的标注文件
<div align="center">
<img src="doc/images/12.png" width="800">
</div>
其中label目录下的标注文件为程序运行过程中产生的缓存如果程序在中途异常终止可以使用缓存的标注文件。
如果程序正常运行完毕则会在output_data下生成label.txt为最终的标注结果。
<a name="应用案例"></a>
### 四、应用案例
下面以金属表面英文数字识别和通用韩语识别两个场景为例说明使用Style-Text合成数据来提升文本识别效果的实际案例。下图给出了一些真实场景图像和合成图像的示例
<div align="center">
<img src="doc/images/6.png" width="800">
</div>
在添加上述合成数据进行训练后,识别模型的效果提升,如下表所示:
| 场景 | 字符 | 原始数据 | 测试数据 | 只使用原始数据</br>识别准确率 | 新增合成数据 | 同时使用合成数据</br>识别准确率 | 指标提升 |
| -------- | ---------- | -------- | -------- | -------------------------- | ------------ | ---------------------- | -------- |
| 金属表面 | 英文和数字 | 2203 | 650 | 59.38% | 20000 | 75.46% | 16.08% |
| 随机背景 | 韩语 | 5631 | 1230 | 30.12% | 100000 | 50.57% | 20.45% |
<a name="代码结构"></a>
### 五、代码结构
```
StyleText
|-- arch // 网络结构定义文件
| |-- base_module.py
| |-- decoder.py
| |-- encoder.py
| |-- spectral_norm.py
| `-- style_text_rec.py
|-- configs // 配置文件
| |-- config.yml
| `-- dataset_config.yml
|-- engine // 数据合成引擎
| |-- corpus_generators.py // 从文本采样或随机生成语料
| |-- predictors.py // 调用网络生成数据
| |-- style_samplers.py // 采样风格图片
| |-- synthesisers.py // 调度各个模块,合成数据
| |-- text_drawers.py // 生成标准文字图片,用作输入
| `-- writers.py // 将合成的图片和标签写入本地目录
|-- examples // 示例文件
| |-- corpus
| | `-- example.txt
| |-- image_list.txt
| `-- style_images
| |-- 1.jpg
| `-- 2.jpg
|-- fonts // 字体文件
| |-- ch_standard.ttf
| |-- en_standard.ttf
| `-- ko_standard.ttf
|-- tools // 程序入口
| |-- __init__.py
| |-- synth_dataset.py // 批量合成数据
| `-- synth_image.py // 合成单张图片
`-- utils // 其他基础功能模块
|-- config.py
|-- load_params.py
|-- logging.py
|-- math_functions.py
`-- sys_funcs.py
```

View File

View File

@ -1,268 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
from arch.spectral_norm import spectral_norm
class CBN(nn.Layer):
def __init__(
self,
name,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
use_bias=False,
norm_layer=None,
act=None,
act_attr=None,
):
super(CBN, self).__init__()
if use_bias:
bias_attr = paddle.ParamAttr(name=name + "_bias")
else:
bias_attr = None
self._conv = paddle.nn.Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
weight_attr=paddle.ParamAttr(name=name + "_weights"),
bias_attr=bias_attr,
)
if norm_layer:
self._norm_layer = getattr(paddle.nn, norm_layer)(
num_features=out_channels, name=name + "_bn"
)
else:
self._norm_layer = None
if act:
if act_attr:
self._act = getattr(paddle.nn, act)(**act_attr, name=name + "_" + act)
else:
self._act = getattr(paddle.nn, act)(name=name + "_" + act)
else:
self._act = None
def forward(self, x):
out = self._conv(x)
if self._norm_layer:
out = self._norm_layer(out)
if self._act:
out = self._act(out)
return out
class SNConv(nn.Layer):
def __init__(
self,
name,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
use_bias=False,
norm_layer=None,
act=None,
act_attr=None,
):
super(SNConv, self).__init__()
if use_bias:
bias_attr = paddle.ParamAttr(name=name + "_bias")
else:
bias_attr = None
self._sn_conv = spectral_norm(
paddle.nn.Conv2D(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
weight_attr=paddle.ParamAttr(name=name + "_weights"),
bias_attr=bias_attr,
)
)
if norm_layer:
self._norm_layer = getattr(paddle.nn, norm_layer)(
num_features=out_channels, name=name + "_bn"
)
else:
self._norm_layer = None
if act:
if act_attr:
self._act = getattr(paddle.nn, act)(**act_attr, name=name + "_" + act)
else:
self._act = getattr(paddle.nn, act)(name=name + "_" + act)
else:
self._act = None
def forward(self, x):
out = self._sn_conv(x)
if self._norm_layer:
out = self._norm_layer(out)
if self._act:
out = self._act(out)
return out
class SNConvTranspose(nn.Layer):
def __init__(
self,
name,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
output_padding=0,
dilation=1,
groups=1,
use_bias=False,
norm_layer=None,
act=None,
act_attr=None,
):
super(SNConvTranspose, self).__init__()
if use_bias:
bias_attr = paddle.ParamAttr(name=name + "_bias")
else:
bias_attr = None
self._sn_conv_transpose = spectral_norm(
paddle.nn.Conv2DTranspose(
in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
output_padding=output_padding,
dilation=dilation,
groups=groups,
weight_attr=paddle.ParamAttr(name=name + "_weights"),
bias_attr=bias_attr,
)
)
if norm_layer:
self._norm_layer = getattr(paddle.nn, norm_layer)(
num_features=out_channels, name=name + "_bn"
)
else:
self._norm_layer = None
if act:
if act_attr:
self._act = getattr(paddle.nn, act)(**act_attr, name=name + "_" + act)
else:
self._act = getattr(paddle.nn, act)(name=name + "_" + act)
else:
self._act = None
def forward(self, x):
out = self._sn_conv_transpose(x)
if self._norm_layer:
out = self._norm_layer(out)
if self._act:
out = self._act(out)
return out
class MiddleNet(nn.Layer):
def __init__(self, name, in_channels, mid_channels, out_channels, use_bias):
super(MiddleNet, self).__init__()
self._sn_conv1 = SNConv(
name=name + "_sn_conv1",
in_channels=in_channels,
out_channels=mid_channels,
kernel_size=1,
use_bias=use_bias,
norm_layer=None,
act=None,
)
self._pad2d = nn.Pad2D(padding=[1, 1, 1, 1], mode="replicate")
self._sn_conv2 = SNConv(
name=name + "_sn_conv2",
in_channels=mid_channels,
out_channels=mid_channels,
kernel_size=3,
use_bias=use_bias,
)
self._sn_conv3 = SNConv(
name=name + "_sn_conv3",
in_channels=mid_channels,
out_channels=out_channels,
kernel_size=1,
use_bias=use_bias,
)
def forward(self, x):
sn_conv1 = self._sn_conv1.forward(x)
pad_2d = self._pad2d.forward(sn_conv1)
sn_conv2 = self._sn_conv2.forward(pad_2d)
sn_conv3 = self._sn_conv3.forward(sn_conv2)
return sn_conv3
class ResBlock(nn.Layer):
def __init__(self, name, channels, norm_layer, use_dropout, use_dilation, use_bias):
super(ResBlock, self).__init__()
if use_dilation:
padding_mat = [1, 1, 1, 1]
else:
padding_mat = [0, 0, 0, 0]
self._pad1 = nn.Pad2D(padding_mat, mode="replicate")
self._sn_conv1 = SNConv(
name=name + "_sn_conv1",
in_channels=channels,
out_channels=channels,
kernel_size=3,
padding=0,
norm_layer=norm_layer,
use_bias=use_bias,
act="ReLU",
act_attr=None,
)
if use_dropout:
self._dropout = nn.Dropout(0.5)
else:
self._dropout = None
self._pad2 = nn.Pad2D([1, 1, 1, 1], mode="replicate")
self._sn_conv2 = SNConv(
name=name + "_sn_conv2",
in_channels=channels,
out_channels=channels,
kernel_size=3,
norm_layer=norm_layer,
use_bias=use_bias,
act="ReLU",
act_attr=None,
)
def forward(self, x):
pad1 = self._pad1.forward(x)
sn_conv1 = self._sn_conv1.forward(pad1)
pad2 = self._pad2.forward(sn_conv1)
sn_conv2 = self._sn_conv2.forward(pad2)
return sn_conv2 + x

View File

@ -1,303 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
from arch.base_module import SNConv, SNConvTranspose, ResBlock
class Decoder(nn.Layer):
def __init__(
self,
name,
encode_dim,
out_channels,
use_bias,
norm_layer,
act,
act_attr,
conv_block_dropout,
conv_block_num,
conv_block_dilation,
out_conv_act,
out_conv_act_attr,
):
super(Decoder, self).__init__()
conv_blocks = []
for i in range(conv_block_num):
conv_blocks.append(
ResBlock(
name="{}_conv_block_{}".format(name, i),
channels=encode_dim * 8,
norm_layer=norm_layer,
use_dropout=conv_block_dropout,
use_dilation=conv_block_dilation,
use_bias=use_bias,
)
)
self.conv_blocks = nn.Sequential(*conv_blocks)
self._up1 = SNConvTranspose(
name=name + "_up1",
in_channels=encode_dim * 8,
out_channels=encode_dim * 4,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._up2 = SNConvTranspose(
name=name + "_up2",
in_channels=encode_dim * 4,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._up3 = SNConvTranspose(
name=name + "_up3",
in_channels=encode_dim * 2,
out_channels=encode_dim,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate")
self._out_conv = SNConv(
name=name + "_out_conv",
in_channels=encode_dim,
out_channels=out_channels,
kernel_size=3,
use_bias=use_bias,
norm_layer=None,
act=out_conv_act,
act_attr=out_conv_act_attr,
)
def forward(self, x):
if isinstance(x, (list, tuple)):
x = paddle.concat(x, axis=1)
output_dict = dict()
output_dict["conv_blocks"] = self.conv_blocks.forward(x)
output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"])
output_dict["up2"] = self._up2.forward(output_dict["up1"])
output_dict["up3"] = self._up3.forward(output_dict["up2"])
output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"])
output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"])
return output_dict
class DecoderUnet(nn.Layer):
def __init__(
self,
name,
encode_dim,
out_channels,
use_bias,
norm_layer,
act,
act_attr,
conv_block_dropout,
conv_block_num,
conv_block_dilation,
out_conv_act,
out_conv_act_attr,
):
super(DecoderUnet, self).__init__()
conv_blocks = []
for i in range(conv_block_num):
conv_blocks.append(
ResBlock(
name="{}_conv_block_{}".format(name, i),
channels=encode_dim * 8,
norm_layer=norm_layer,
use_dropout=conv_block_dropout,
use_dilation=conv_block_dilation,
use_bias=use_bias,
)
)
self._conv_blocks = nn.Sequential(*conv_blocks)
self._up1 = SNConvTranspose(
name=name + "_up1",
in_channels=encode_dim * 8,
out_channels=encode_dim * 4,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._up2 = SNConvTranspose(
name=name + "_up2",
in_channels=encode_dim * 8,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._up3 = SNConvTranspose(
name=name + "_up3",
in_channels=encode_dim * 4,
out_channels=encode_dim,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate")
self._out_conv = SNConv(
name=name + "_out_conv",
in_channels=encode_dim,
out_channels=out_channels,
kernel_size=3,
use_bias=use_bias,
norm_layer=None,
act=out_conv_act,
act_attr=out_conv_act_attr,
)
def forward(self, x, y, feature2, feature1):
output_dict = dict()
output_dict["conv_blocks"] = self._conv_blocks(paddle.concat((x, y), axis=1))
output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"])
output_dict["up2"] = self._up2.forward(
paddle.concat((output_dict["up1"], feature2), axis=1)
)
output_dict["up3"] = self._up3.forward(
paddle.concat((output_dict["up2"], feature1), axis=1)
)
output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"])
output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"])
return output_dict
class SingleDecoder(nn.Layer):
def __init__(
self,
name,
encode_dim,
out_channels,
use_bias,
norm_layer,
act,
act_attr,
conv_block_dropout,
conv_block_num,
conv_block_dilation,
out_conv_act,
out_conv_act_attr,
):
super(SingleDecoder, self).__init__()
conv_blocks = []
for i in range(conv_block_num):
conv_blocks.append(
ResBlock(
name="{}_conv_block_{}".format(name, i),
channels=encode_dim * 4,
norm_layer=norm_layer,
use_dropout=conv_block_dropout,
use_dilation=conv_block_dilation,
use_bias=use_bias,
)
)
self._conv_blocks = nn.Sequential(*conv_blocks)
self._up1 = SNConvTranspose(
name=name + "_up1",
in_channels=encode_dim * 4,
out_channels=encode_dim * 4,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._up2 = SNConvTranspose(
name=name + "_up2",
in_channels=encode_dim * 8,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._up3 = SNConvTranspose(
name=name + "_up3",
in_channels=encode_dim * 4,
out_channels=encode_dim,
kernel_size=3,
stride=2,
padding=1,
output_padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._pad2d = paddle.nn.Pad2D([1, 1, 1, 1], mode="replicate")
self._out_conv = SNConv(
name=name + "_out_conv",
in_channels=encode_dim,
out_channels=out_channels,
kernel_size=3,
use_bias=use_bias,
norm_layer=None,
act=out_conv_act,
act_attr=out_conv_act_attr,
)
def forward(self, x, feature2, feature1):
output_dict = dict()
output_dict["conv_blocks"] = self._conv_blocks.forward(x)
output_dict["up1"] = self._up1.forward(output_dict["conv_blocks"])
output_dict["up2"] = self._up2.forward(
paddle.concat((output_dict["up1"], feature2), axis=1)
)
output_dict["up3"] = self._up3.forward(
paddle.concat((output_dict["up2"], feature1), axis=1)
)
output_dict["pad2d"] = self._pad2d.forward(output_dict["up3"])
output_dict["out_conv"] = self._out_conv.forward(output_dict["pad2d"])
return output_dict

View File

@ -1,211 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
from arch.base_module import SNConv, SNConvTranspose, ResBlock
class Encoder(nn.Layer):
def __init__(
self,
name,
in_channels,
encode_dim,
use_bias,
norm_layer,
act,
act_attr,
conv_block_dropout,
conv_block_num,
conv_block_dilation,
):
super(Encoder, self).__init__()
self._pad2d = paddle.nn.Pad2D([3, 3, 3, 3], mode="replicate")
self._in_conv = SNConv(
name=name + "_in_conv",
in_channels=in_channels,
out_channels=encode_dim,
kernel_size=7,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._down1 = SNConv(
name=name + "_down1",
in_channels=encode_dim,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._down2 = SNConv(
name=name + "_down2",
in_channels=encode_dim * 2,
out_channels=encode_dim * 4,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._down3 = SNConv(
name=name + "_down3",
in_channels=encode_dim * 4,
out_channels=encode_dim * 4,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
conv_blocks = []
for i in range(conv_block_num):
conv_blocks.append(
ResBlock(
name="{}_conv_block_{}".format(name, i),
channels=encode_dim * 4,
norm_layer=norm_layer,
use_dropout=conv_block_dropout,
use_dilation=conv_block_dilation,
use_bias=use_bias,
)
)
self._conv_blocks = nn.Sequential(*conv_blocks)
def forward(self, x):
out_dict = dict()
x = self._pad2d(x)
out_dict["in_conv"] = self._in_conv.forward(x)
out_dict["down1"] = self._down1.forward(out_dict["in_conv"])
out_dict["down2"] = self._down2.forward(out_dict["down1"])
out_dict["down3"] = self._down3.forward(out_dict["down2"])
out_dict["res_blocks"] = self._conv_blocks.forward(out_dict["down3"])
return out_dict
class EncoderUnet(nn.Layer):
def __init__(
self, name, in_channels, encode_dim, use_bias, norm_layer, act, act_attr
):
super(EncoderUnet, self).__init__()
self._pad2d = paddle.nn.Pad2D([3, 3, 3, 3], mode="replicate")
self._in_conv = SNConv(
name=name + "_in_conv",
in_channels=in_channels,
out_channels=encode_dim,
kernel_size=7,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._down1 = SNConv(
name=name + "_down1",
in_channels=encode_dim,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._down2 = SNConv(
name=name + "_down2",
in_channels=encode_dim * 2,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._down3 = SNConv(
name=name + "_down3",
in_channels=encode_dim * 2,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._down4 = SNConv(
name=name + "_down4",
in_channels=encode_dim * 2,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._up1 = SNConvTranspose(
name=name + "_up1",
in_channels=encode_dim * 2,
out_channels=encode_dim * 2,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
self._up2 = SNConvTranspose(
name=name + "_up2",
in_channels=encode_dim * 4,
out_channels=encode_dim * 4,
kernel_size=3,
stride=2,
padding=1,
use_bias=use_bias,
norm_layer=norm_layer,
act=act,
act_attr=act_attr,
)
def forward(self, x):
output_dict = dict()
x = self._pad2d(x)
output_dict["in_conv"] = self._in_conv.forward(x)
output_dict["down1"] = self._down1.forward(output_dict["in_conv"])
output_dict["down2"] = self._down2.forward(output_dict["down1"])
output_dict["down3"] = self._down3.forward(output_dict["down2"])
output_dict["down4"] = self._down4.forward(output_dict["down3"])
output_dict["up1"] = self._up1.forward(output_dict["down4"])
output_dict["up2"] = self._up2.forward(
paddle.concat((output_dict["down3"], output_dict["up1"]), axis=1)
)
output_dict["concat"] = paddle.concat(
(output_dict["down2"], output_dict["up2"]), axis=1
)
return output_dict

View File

@ -1,150 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
def normal_(x, mean=0.0, std=1.0):
temp_value = paddle.normal(mean, std, shape=x.shape)
x.set_value(temp_value)
return x
class SpectralNorm(object):
def __init__(self, name="weight", n_power_iterations=1, dim=0, eps=1e-12):
self.name = name
self.dim = dim
if n_power_iterations <= 0:
raise ValueError(
"Expected n_power_iterations to be positive, but "
"got n_power_iterations={}".format(n_power_iterations)
)
self.n_power_iterations = n_power_iterations
self.eps = eps
def reshape_weight_to_matrix(self, weight):
weight_mat = weight
if self.dim != 0:
# transpose dim to front
weight_mat = weight_mat.transpose(
[self.dim, *[d for d in range(weight_mat.dim()) if d != self.dim]]
)
height = weight_mat.shape[0]
return weight_mat.reshape([height, -1])
def compute_weight(self, module, do_power_iteration):
weight = getattr(module, self.name + "_orig")
u = getattr(module, self.name + "_u")
v = getattr(module, self.name + "_v")
weight_mat = self.reshape_weight_to_matrix(weight)
if do_power_iteration:
with paddle.no_grad():
for _ in range(self.n_power_iterations):
v.set_value(
F.normalize(
paddle.matmul(
weight_mat, u, transpose_x=True, transpose_y=False
),
axis=0,
epsilon=self.eps,
)
)
u.set_value(
F.normalize(
paddle.matmul(weight_mat, v),
axis=0,
epsilon=self.eps,
)
)
if self.n_power_iterations > 0:
u = u.clone()
v = v.clone()
sigma = paddle.dot(u, paddle.mv(weight_mat, v))
weight = weight / sigma
return weight
def remove(self, module):
with paddle.no_grad():
weight = self.compute_weight(module, do_power_iteration=False)
delattr(module, self.name)
delattr(module, self.name + "_u")
delattr(module, self.name + "_v")
delattr(module, self.name + "_orig")
module.add_parameter(self.name, weight.detach())
def __call__(self, module, inputs):
setattr(
module,
self.name,
self.compute_weight(module, do_power_iteration=module.training),
)
@staticmethod
def apply(module, name, n_power_iterations, dim, eps):
for k, hook in module._forward_pre_hooks.items():
if isinstance(hook, SpectralNorm) and hook.name == name:
raise RuntimeError(
"Cannot register two spectral_norm hooks on "
"the same parameter {}".format(name)
)
fn = SpectralNorm(name, n_power_iterations, dim, eps)
weight = module._parameters[name]
with paddle.no_grad():
weight_mat = fn.reshape_weight_to_matrix(weight)
h, w = weight_mat.shape
# randomly initialize u and v
u = module.create_parameter([h])
u = normal_(u, 0.0, 1.0)
v = module.create_parameter([w])
v = normal_(v, 0.0, 1.0)
u = F.normalize(u, axis=0, epsilon=fn.eps)
v = F.normalize(v, axis=0, epsilon=fn.eps)
# delete fn.name form parameters, otherwise you can not set attribute
del module._parameters[fn.name]
module.add_parameter(fn.name + "_orig", weight)
# still need to assign weight back as fn.name because all sorts of
# things may assume that it exists, e.g., when initializing weights.
# However, we can't directly assign as it could be an Parameter and
# gets added as a parameter. Instead, we register weight * 1.0 as a plain
# attribute.
setattr(module, fn.name, weight * 1.0)
module.register_buffer(fn.name + "_u", u)
module.register_buffer(fn.name + "_v", v)
module.register_forward_pre_hook(fn)
return fn
def spectral_norm(module, name="weight", n_power_iterations=1, eps=1e-12, dim=None):
if dim is None:
if isinstance(
module,
(nn.Conv1DTranspose, nn.Conv2DTranspose, nn.Conv3DTranspose, nn.Linear),
):
dim = 1
else:
dim = 0
SpectralNorm.apply(module, name, n_power_iterations, dim, eps)
return module

View File

@ -1,298 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
from arch.base_module import MiddleNet, ResBlock
from arch.encoder import Encoder
from arch.decoder import Decoder, DecoderUnet, SingleDecoder
from utils.load_params import load_dygraph_pretrain
from utils.logging import get_logger
class StyleTextRec(nn.Layer):
def __init__(self, config):
super(StyleTextRec, self).__init__()
self.logger = get_logger()
self.text_generator = TextGenerator(config["Predictor"]["text_generator"])
self.bg_generator = BgGeneratorWithMask(config["Predictor"]["bg_generator"])
self.fusion_generator = FusionGeneratorSimple(
config["Predictor"]["fusion_generator"]
)
bg_generator_pretrain = config["Predictor"]["bg_generator"]["pretrain"]
text_generator_pretrain = config["Predictor"]["text_generator"]["pretrain"]
fusion_generator_pretrain = config["Predictor"]["fusion_generator"]["pretrain"]
load_dygraph_pretrain(
self.bg_generator,
self.logger,
path=bg_generator_pretrain,
load_static_weights=False,
)
load_dygraph_pretrain(
self.text_generator,
self.logger,
path=text_generator_pretrain,
load_static_weights=False,
)
load_dygraph_pretrain(
self.fusion_generator,
self.logger,
path=fusion_generator_pretrain,
load_static_weights=False,
)
def forward(self, style_input, text_input):
text_gen_output = self.text_generator.forward(style_input, text_input)
fake_text = text_gen_output["fake_text"]
fake_sk = text_gen_output["fake_sk"]
bg_gen_output = self.bg_generator.forward(style_input)
bg_encode_feature = bg_gen_output["bg_encode_feature"]
bg_decode_feature1 = bg_gen_output["bg_decode_feature1"]
bg_decode_feature2 = bg_gen_output["bg_decode_feature2"]
fake_bg = bg_gen_output["fake_bg"]
fusion_gen_output = self.fusion_generator.forward(fake_text, fake_bg)
fake_fusion = fusion_gen_output["fake_fusion"]
return {
"fake_fusion": fake_fusion,
"fake_text": fake_text,
"fake_sk": fake_sk,
"fake_bg": fake_bg,
}
class TextGenerator(nn.Layer):
def __init__(self, config):
super(TextGenerator, self).__init__()
name = config["module_name"]
encode_dim = config["encode_dim"]
norm_layer = config["norm_layer"]
conv_block_dropout = config["conv_block_dropout"]
conv_block_num = config["conv_block_num"]
conv_block_dilation = config["conv_block_dilation"]
if norm_layer == "InstanceNorm2D":
use_bias = True
else:
use_bias = False
self.encoder_text = Encoder(
name=name + "_encoder_text",
in_channels=3,
encode_dim=encode_dim,
use_bias=use_bias,
norm_layer=norm_layer,
act="ReLU",
act_attr=None,
conv_block_dropout=conv_block_dropout,
conv_block_num=conv_block_num,
conv_block_dilation=conv_block_dilation,
)
self.encoder_style = Encoder(
name=name + "_encoder_style",
in_channels=3,
encode_dim=encode_dim,
use_bias=use_bias,
norm_layer=norm_layer,
act="ReLU",
act_attr=None,
conv_block_dropout=conv_block_dropout,
conv_block_num=conv_block_num,
conv_block_dilation=conv_block_dilation,
)
self.decoder_text = Decoder(
name=name + "_decoder_text",
encode_dim=encode_dim,
out_channels=int(encode_dim / 2),
use_bias=use_bias,
norm_layer=norm_layer,
act="ReLU",
act_attr=None,
conv_block_dropout=conv_block_dropout,
conv_block_num=conv_block_num,
conv_block_dilation=conv_block_dilation,
out_conv_act="Tanh",
out_conv_act_attr=None,
)
self.decoder_sk = Decoder(
name=name + "_decoder_sk",
encode_dim=encode_dim,
out_channels=1,
use_bias=use_bias,
norm_layer=norm_layer,
act="ReLU",
act_attr=None,
conv_block_dropout=conv_block_dropout,
conv_block_num=conv_block_num,
conv_block_dilation=conv_block_dilation,
out_conv_act="Sigmoid",
out_conv_act_attr=None,
)
self.middle = MiddleNet(
name=name + "_middle_net",
in_channels=int(encode_dim / 2) + 1,
mid_channels=encode_dim,
out_channels=3,
use_bias=use_bias,
)
def forward(self, style_input, text_input):
style_feature = self.encoder_style.forward(style_input)["res_blocks"]
text_feature = self.encoder_text.forward(text_input)["res_blocks"]
fake_c_temp = self.decoder_text.forward([text_feature, style_feature])[
"out_conv"
]
fake_sk = self.decoder_sk.forward([text_feature, style_feature])["out_conv"]
fake_text = self.middle(paddle.concat((fake_c_temp, fake_sk), axis=1))
return {"fake_sk": fake_sk, "fake_text": fake_text}
class BgGeneratorWithMask(nn.Layer):
def __init__(self, config):
super(BgGeneratorWithMask, self).__init__()
name = config["module_name"]
encode_dim = config["encode_dim"]
norm_layer = config["norm_layer"]
conv_block_dropout = config["conv_block_dropout"]
conv_block_num = config["conv_block_num"]
conv_block_dilation = config["conv_block_dilation"]
self.output_factor = config.get("output_factor", 1.0)
if norm_layer == "InstanceNorm2D":
use_bias = True
else:
use_bias = False
self.encoder_bg = Encoder(
name=name + "_encoder_bg",
in_channels=3,
encode_dim=encode_dim,
use_bias=use_bias,
norm_layer=norm_layer,
act="ReLU",
act_attr=None,
conv_block_dropout=conv_block_dropout,
conv_block_num=conv_block_num,
conv_block_dilation=conv_block_dilation,
)
self.decoder_bg = SingleDecoder(
name=name + "_decoder_bg",
encode_dim=encode_dim,
out_channels=3,
use_bias=use_bias,
norm_layer=norm_layer,
act="ReLU",
act_attr=None,
conv_block_dropout=conv_block_dropout,
conv_block_num=conv_block_num,
conv_block_dilation=conv_block_dilation,
out_conv_act="Tanh",
out_conv_act_attr=None,
)
self.decoder_mask = Decoder(
name=name + "_decoder_mask",
encode_dim=encode_dim // 2,
out_channels=1,
use_bias=use_bias,
norm_layer=norm_layer,
act="ReLU",
act_attr=None,
conv_block_dropout=conv_block_dropout,
conv_block_num=conv_block_num,
conv_block_dilation=conv_block_dilation,
out_conv_act="Sigmoid",
out_conv_act_attr=None,
)
self.middle = MiddleNet(
name=name + "_middle_net",
in_channels=3 + 1,
mid_channels=encode_dim,
out_channels=3,
use_bias=use_bias,
)
def forward(self, style_input):
encode_bg_output = self.encoder_bg(style_input)
decode_bg_output = self.decoder_bg(
encode_bg_output["res_blocks"],
encode_bg_output["down2"],
encode_bg_output["down1"],
)
fake_c_temp = decode_bg_output["out_conv"]
fake_bg_mask = self.decoder_mask.forward(encode_bg_output["res_blocks"])[
"out_conv"
]
fake_bg = self.middle(paddle.concat((fake_c_temp, fake_bg_mask), axis=1))
return {
"bg_encode_feature": encode_bg_output["res_blocks"],
"bg_decode_feature1": decode_bg_output["up1"],
"bg_decode_feature2": decode_bg_output["up2"],
"fake_bg": fake_bg,
"fake_bg_mask": fake_bg_mask,
}
class FusionGeneratorSimple(nn.Layer):
def __init__(self, config):
super(FusionGeneratorSimple, self).__init__()
name = config["module_name"]
encode_dim = config["encode_dim"]
norm_layer = config["norm_layer"]
conv_block_dropout = config["conv_block_dropout"]
conv_block_dilation = config["conv_block_dilation"]
if norm_layer == "InstanceNorm2D":
use_bias = True
else:
use_bias = False
self._conv = nn.Conv2D(
in_channels=6,
out_channels=encode_dim,
kernel_size=3,
stride=1,
padding=1,
groups=1,
weight_attr=paddle.ParamAttr(name=name + "_conv_weights"),
bias_attr=False,
)
self._res_block = ResBlock(
name="{}_conv_block".format(name),
channels=encode_dim,
norm_layer=norm_layer,
use_dropout=conv_block_dropout,
use_dilation=conv_block_dilation,
use_bias=use_bias,
)
self._reduce_conv = nn.Conv2D(
in_channels=encode_dim,
out_channels=3,
kernel_size=3,
stride=1,
padding=1,
groups=1,
weight_attr=paddle.ParamAttr(name=name + "_reduce_conv_weights"),
bias_attr=False,
)
def forward(self, fake_text, fake_bg):
fake_concat = paddle.concat((fake_text, fake_bg), axis=1)
fake_concat_tmp = self._conv(fake_concat)
output_res = self._res_block(fake_concat_tmp)
fake_fusion = self._reduce_conv(output_res)
return {"fake_fusion": fake_fusion}

View File

@ -1,54 +0,0 @@
Global:
output_num: 10
output_dir: output_data
use_gpu: false
image_height: 32
image_width: 320
TextDrawer:
fonts:
en: fonts/en_standard.ttf
ch: fonts/ch_standard.ttf
ko: fonts/ko_standard.ttf
Predictor:
method: StyleTextRecPredictor
algorithm: StyleTextRec
scale: 0.00392156862745098
mean:
- 0.5
- 0.5
- 0.5
std:
- 0.5
- 0.5
- 0.5
expand_result: false
bg_generator:
pretrain: style_text_models/bg_generator
module_name: bg_generator
generator_type: BgGeneratorWithMask
encode_dim: 64
norm_layer: null
conv_block_num: 4
conv_block_dropout: false
conv_block_dilation: true
output_factor: 1.05
text_generator:
pretrain: style_text_models/text_generator
module_name: text_generator
generator_type: TextGenerator
encode_dim: 64
norm_layer: InstanceNorm2D
conv_block_num: 4
conv_block_dropout: false
conv_block_dilation: true
fusion_generator:
pretrain: style_text_models/fusion_generator
module_name: fusion_generator
generator_type: FusionGeneratorSimple
encode_dim: 64
norm_layer: null
conv_block_num: 4
conv_block_dropout: false
conv_block_dilation: true
Writer:
method: SimpleWriter

View File

@ -1,64 +0,0 @@
Global:
output_num: 10
output_dir: output_data
use_gpu: false
image_height: 32
image_width: 320
standard_font: fonts/en_standard.ttf
TextDrawer:
fonts:
en: fonts/en_standard.ttf
ch: fonts/ch_standard.ttf
ko: fonts/ko_standard.ttf
StyleSampler:
method: DatasetSampler
image_home: examples
label_file: examples/image_list.txt
with_label: true
CorpusGenerator:
method: FileCorpus
language: ch
corpus_file: examples/corpus/example.txt
Predictor:
method: StyleTextRecPredictor
algorithm: StyleTextRec
scale: 0.00392156862745098
mean:
- 0.5
- 0.5
- 0.5
std:
- 0.5
- 0.5
- 0.5
expand_result: false
bg_generator:
pretrain: style_text_models/bg_generator
module_name: bg_generator
generator_type: BgGeneratorWithMask
encode_dim: 64
norm_layer: null
conv_block_num: 4
conv_block_dropout: false
conv_block_dilation: true
output_factor: 1.05
text_generator:
pretrain: style_text_models/text_generator
module_name: text_generator
generator_type: TextGenerator
encode_dim: 64
norm_layer: InstanceNorm2D
conv_block_num: 4
conv_block_dropout: false
conv_block_dilation: true
fusion_generator:
pretrain: style_text_models/fusion_generator
module_name: fusion_generator
generator_type: FusionGeneratorSimple
encode_dim: 64
norm_layer: null
conv_block_num: 4
conv_block_dropout: false
conv_block_dilation: true
Writer:
method: SimpleWriter

Binary file not shown.

Before

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 192 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 201 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 125 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 154 KiB

View File

@ -1,68 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
from utils.logging import get_logger
class FileCorpus(object):
def __init__(self, config):
self.logger = get_logger()
self.logger.info("using FileCorpus")
self.char_list = (
" 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
)
corpus_file = config["CorpusGenerator"]["corpus_file"]
self.language = config["CorpusGenerator"]["language"]
with open(corpus_file, "r") as f:
corpus_raw = f.read()
self.corpus_list = corpus_raw.split("\n")[:-1]
assert len(self.corpus_list) > 0
random.shuffle(self.corpus_list)
self.index = 0
def generate(self, corpus_length=0):
if self.index >= len(self.corpus_list):
self.index = 0
random.shuffle(self.corpus_list)
corpus = self.corpus_list[self.index]
if corpus_length != 0:
corpus = corpus[0:corpus_length]
if corpus_length > len(corpus):
self.logger.warning("generated corpus is shorter than expected.")
self.index += 1
return self.language, corpus
class EnNumCorpus(object):
def __init__(self, config):
self.logger = get_logger()
self.logger.info("using NumberCorpus")
self.num_list = "0123456789"
self.en_char_list = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
self.height = config["Global"]["image_height"]
self.max_width = config["Global"]["image_width"]
def generate(self, corpus_length=0):
corpus = ""
if corpus_length == 0:
corpus_length = random.randint(5, 15)
for i in range(corpus_length):
if random.random() < 0.2:
corpus += "{}".format(random.choice(self.en_char_list))
else:
corpus += "{}".format(random.choice(self.num_list))
return "en", corpus

View File

@ -1,146 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import cv2
import math
import paddle
from arch import style_text_rec
from utils.sys_funcs import check_gpu
from utils.logging import get_logger
class StyleTextRecPredictor(object):
def __init__(self, config):
algorithm = config["Predictor"]["algorithm"]
assert algorithm in ["StyleTextRec"], "Generator {} not supported.".format(
algorithm
)
use_gpu = config["Global"]["use_gpu"]
check_gpu(use_gpu)
paddle.set_device("gpu" if use_gpu else "cpu")
self.logger = get_logger()
self.generator = getattr(style_text_rec, algorithm)(config)
self.height = config["Global"]["image_height"]
self.width = config["Global"]["image_width"]
self.scale = config["Predictor"]["scale"]
self.mean = config["Predictor"]["mean"]
self.std = config["Predictor"]["std"]
self.expand_result = config["Predictor"]["expand_result"]
def reshape_to_same_height(self, img_list):
h = img_list[0].shape[0]
for idx in range(1, len(img_list)):
new_w = round(1.0 * img_list[idx].shape[1] / img_list[idx].shape[0] * h)
img_list[idx] = cv2.resize(img_list[idx], (new_w, h))
return img_list
def predict_single_image(self, style_input, text_input):
style_input = self.rep_style_input(style_input, text_input)
tensor_style_input = self.preprocess(style_input)
tensor_text_input = self.preprocess(text_input)
style_text_result = self.generator.forward(
tensor_style_input, tensor_text_input
)
fake_fusion = self.postprocess(style_text_result["fake_fusion"])
fake_text = self.postprocess(style_text_result["fake_text"])
fake_sk = self.postprocess(style_text_result["fake_sk"])
fake_bg = self.postprocess(style_text_result["fake_bg"])
bbox = self.get_text_boundary(fake_text)
if bbox:
left, right, top, bottom = bbox
fake_fusion = fake_fusion[top:bottom, left:right, :]
fake_text = fake_text[top:bottom, left:right, :]
fake_sk = fake_sk[top:bottom, left:right, :]
fake_bg = fake_bg[top:bottom, left:right, :]
# fake_fusion = self.crop_by_text(img_fake_fusion, img_fake_text)
return {
"fake_fusion": fake_fusion,
"fake_text": fake_text,
"fake_sk": fake_sk,
"fake_bg": fake_bg,
}
def predict(self, style_input, text_input_list):
if not isinstance(text_input_list, (tuple, list)):
return self.predict_single_image(style_input, text_input_list)
synth_result_list = []
for text_input in text_input_list:
synth_result = self.predict_single_image(style_input, text_input)
synth_result_list.append(synth_result)
for key in synth_result:
res = [r[key] for r in synth_result_list]
res = self.reshape_to_same_height(res)
synth_result[key] = np.concatenate(res, axis=1)
return synth_result
def preprocess(self, img):
img = (img.astype("float32") * self.scale - self.mean) / self.std
img_height, img_width, channel = img.shape
assert channel == 3, "Please use an rgb image."
ratio = img_width / float(img_height)
if math.ceil(self.height * ratio) > self.width:
resized_w = self.width
else:
resized_w = int(math.ceil(self.height * ratio))
img = cv2.resize(img, (resized_w, self.height))
new_img = np.zeros([self.height, self.width, 3]).astype("float32")
new_img[:, 0:resized_w, :] = img
img = new_img.transpose((2, 0, 1))
img = img[np.newaxis, :, :, :]
return paddle.to_tensor(img)
def postprocess(self, tensor):
img = tensor.numpy()[0]
img = img.transpose((1, 2, 0))
img = (img * self.std + self.mean) / self.scale
img = np.maximum(img, 0.0)
img = np.minimum(img, 255.0)
img = img.astype("uint8")
return img
def rep_style_input(self, style_input, text_input):
rep_num = (
int(
1.2
* (text_input.shape[1] / text_input.shape[0])
/ (style_input.shape[1] / style_input.shape[0])
)
+ 1
)
style_input = np.tile(style_input, reps=[1, rep_num, 1])
max_width = int(self.width / self.height * style_input.shape[0])
style_input = style_input[:, :max_width, :]
return style_input
def get_text_boundary(self, text_img):
img_height = text_img.shape[0]
img_width = text_img.shape[1]
bounder = 3
text_canny_img = cv2.Canny(text_img, 10, 20)
edge_num_h = text_canny_img.sum(axis=0)
no_zero_list_h = np.where(edge_num_h > 0)[0]
edge_num_w = text_canny_img.sum(axis=1)
no_zero_list_w = np.where(edge_num_w > 0)[0]
if len(no_zero_list_h) == 0 or len(no_zero_list_w) == 0:
return None
left = max(no_zero_list_h[0] - bounder, 0)
right = min(no_zero_list_h[-1] + bounder, img_width)
top = max(no_zero_list_w[0] - bounder, 0)
bottom = min(no_zero_list_w[-1] + bounder, img_height)
return [left, right, top, bottom]

View File

@ -1,62 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import random
import cv2
class DatasetSampler(object):
def __init__(self, config):
self.image_home = config["StyleSampler"]["image_home"]
label_file = config["StyleSampler"]["label_file"]
self.dataset_with_label = config["StyleSampler"]["with_label"]
self.height = config["Global"]["image_height"]
self.index = 0
with open(label_file, "r") as f:
label_raw = f.read()
self.path_label_list = label_raw.split("\n")[:-1]
assert len(self.path_label_list) > 0
random.shuffle(self.path_label_list)
def sample(self):
if self.index >= len(self.path_label_list):
random.shuffle(self.path_label_list)
self.index = 0
if self.dataset_with_label:
path_label = self.path_label_list[self.index]
rel_image_path, label = path_label.split("\t")
else:
rel_image_path = self.path_label_list[self.index]
label = None
img_path = "{}/{}".format(self.image_home, rel_image_path)
image = cv2.imread(img_path)
origin_height = image.shape[0]
ratio = self.height / origin_height
width = int(image.shape[1] * ratio)
height = int(image.shape[0] * ratio)
image = cv2.resize(image, (width, height))
self.index += 1
if label:
return {"image": image, "label": label}
else:
return {"image": image}
def duplicate_image(image, width):
image_width = image.shape[1]
dup_num = width // image_width + 1
image = np.tile(image, reps=[1, dup_num, 1])
cropped_image = image[:, :width, :]
return cropped_image

View File

@ -1,79 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import numpy as np
import cv2
from utils.config import ArgsParser, load_config, override_config
from utils.logging import get_logger
from engine import style_samplers, corpus_generators, text_drawers, predictors, writers
class ImageSynthesiser(object):
def __init__(self):
self.FLAGS = ArgsParser().parse_args()
self.config = load_config(self.FLAGS.config)
self.config = override_config(self.config, options=self.FLAGS.override)
self.output_dir = self.config["Global"]["output_dir"]
if not os.path.exists(self.output_dir):
os.mkdir(self.output_dir)
self.logger = get_logger(log_file="{}/predict.log".format(self.output_dir))
self.text_drawer = text_drawers.StdTextDrawer(self.config)
predictor_method = self.config["Predictor"]["method"]
assert predictor_method is not None
self.predictor = getattr(predictors, predictor_method)(self.config)
def synth_image(self, corpus, style_input, language="en"):
corpus_list, text_input_list = self.text_drawer.draw_text(
corpus, language, style_input_width=style_input.shape[1]
)
synth_result = self.predictor.predict(style_input, text_input_list)
return synth_result
class DatasetSynthesiser(ImageSynthesiser):
def __init__(self):
super(DatasetSynthesiser, self).__init__()
self.tag = self.FLAGS.tag
self.output_num = self.config["Global"]["output_num"]
corpus_generator_method = self.config["CorpusGenerator"]["method"]
self.corpus_generator = getattr(corpus_generators, corpus_generator_method)(
self.config
)
style_sampler_method = self.config["StyleSampler"]["method"]
assert style_sampler_method is not None
self.style_sampler = style_samplers.DatasetSampler(self.config)
self.writer = writers.SimpleWriter(self.config, self.tag)
def synth_dataset(self):
for i in range(self.output_num):
style_data = self.style_sampler.sample()
style_input = style_data["image"]
corpus_language, text_input_label = self.corpus_generator.generate()
text_input_label_list, text_input_list = self.text_drawer.draw_text(
text_input_label,
corpus_language,
style_input_width=style_input.shape[1],
)
text_input_label = "".join(text_input_label_list)
synth_result = self.predictor.predict(style_input, text_input_list)
fake_fusion = synth_result["fake_fusion"]
self.writer.save_image(fake_fusion, text_input_label)
self.writer.save_label()
self.writer.merge_label()

View File

@ -1,84 +0,0 @@
from PIL import Image, ImageDraw, ImageFont
import numpy as np
import cv2
from utils.logging import get_logger
class StdTextDrawer(object):
def __init__(self, config):
self.logger = get_logger()
self.max_width = config["Global"]["image_width"]
self.char_list = (
" 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
)
self.height = config["Global"]["image_height"]
self.font_dict = {}
self.load_fonts(config["TextDrawer"]["fonts"])
self.support_languages = list(self.font_dict)
def load_fonts(self, fonts_config):
for language in fonts_config:
font_path = fonts_config[language]
font_height = self.get_valid_height(font_path)
font = ImageFont.truetype(font_path, font_height)
self.font_dict[language] = font
def get_valid_height(self, font_path):
font = ImageFont.truetype(font_path, self.height - 4)
left, top, right, bottom = font.getbbox(self.char_list)
_, font_height = right - left, bottom - top
if font_height <= self.height - 4:
return self.height - 4
else:
return int((self.height - 4) ** 2 / font_height)
def draw_text(self, corpus, language="en", crop=True, style_input_width=None):
if language not in self.support_languages:
self.logger.warning(
"language {} not supported, use en instead.".format(language)
)
language = "en"
if crop:
width = min(self.max_width, len(corpus) * self.height) + 4
else:
width = len(corpus) * self.height + 4
if style_input_width is not None:
width = min(width, style_input_width)
corpus_list = []
text_input_list = []
while len(corpus) != 0:
bg = Image.new("RGB", (width, self.height), color=(127, 127, 127))
draw = ImageDraw.Draw(bg)
char_x = 2
font = self.font_dict[language]
i = 0
while i < len(corpus):
char_i = corpus[i]
char_size = font.getbbox(char_i)[2]
# split when char_x exceeds char size and index is not 0 (at least 1 char should be wroten on the image)
if char_x + char_size >= width and i != 0:
text_input = np.array(bg).astype(np.uint8)
text_input = text_input[:, 0:char_x, :]
corpus_list.append(corpus[0:i])
text_input_list.append(text_input)
corpus = corpus[i:]
i = 0
break
draw.text((char_x, 2), char_i, fill=(0, 0, 0), font=font)
char_x += char_size
i += 1
# the whole text is shorter than style input
if i == len(corpus):
text_input = np.array(bg).astype(np.uint8)
text_input = text_input[:, 0:char_x, :]
corpus_list.append(corpus[0:i])
text_input_list.append(text_input)
break
return corpus_list, text_input_list

View File

@ -1,69 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import glob
from utils.logging import get_logger
class SimpleWriter(object):
def __init__(self, config, tag):
self.logger = get_logger()
self.output_dir = config["Global"]["output_dir"]
self.counter = 0
self.label_dict = {}
self.tag = tag
self.label_file_index = 0
def save_image(self, image, text_input_label):
image_home = os.path.join(self.output_dir, "images", self.tag)
if not os.path.exists(image_home):
os.makedirs(image_home)
image_path = os.path.join(image_home, "{}.png".format(self.counter))
# todo support continue synth
cv2.imwrite(image_path, image)
self.logger.info("generate image: {}".format(image_path))
image_name = os.path.join(self.tag, "{}.png".format(self.counter))
self.label_dict[image_name] = text_input_label
self.counter += 1
if not self.counter % 100:
self.save_label()
def save_label(self):
label_raw = ""
label_home = os.path.join(self.output_dir, "label")
if not os.path.exists(label_home):
os.mkdir(label_home)
for image_path in self.label_dict:
label = self.label_dict[image_path]
label_raw += "{}\t{}\n".format(image_path, label)
label_file_path = os.path.join(label_home, "{}_label.txt".format(self.tag))
with open(label_file_path, "w") as f:
f.write(label_raw)
self.label_file_index += 1
def merge_label(self):
label_raw = ""
label_file_regex = os.path.join(self.output_dir, "label", "*_label.txt")
label_file_list = glob.glob(label_file_regex)
for label_file_i in label_file_list:
with open(label_file_i, "r") as f:
label_raw += f.read()
label_file_path = os.path.join(self.output_dir, "label.txt")
with open(label_file_path, "w") as f:
f.write(label_raw)

View File

@ -1,2 +0,0 @@
Paddle
飞桨文字识别

View File

@ -1,2 +0,0 @@
style_images/1.jpg NEATNESS
style_images/2.jpg 锁店君和宾馆

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.3 KiB

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -1,31 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, "..")))
from engine.synthesisers import DatasetSynthesiser
def synth_dataset():
dataset_synthesiser = DatasetSynthesiser()
dataset_synthesiser.synth_dataset()
if __name__ == "__main__":
synth_dataset()

View File

@ -1,82 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import sys
import glob
__dir__ = os.path.dirname(os.path.abspath(__file__))
sys.path.append(__dir__)
sys.path.append(os.path.abspath(os.path.join(__dir__, "..")))
from utils.config import ArgsParser
from engine.synthesisers import ImageSynthesiser
def synth_image():
args = ArgsParser().parse_args()
image_synthesiser = ImageSynthesiser()
style_image_path = args.style_image
img = cv2.imread(style_image_path)
text_corpus = args.text_corpus
language = args.language
synth_result = image_synthesiser.synth_image(text_corpus, img, language)
fake_fusion = synth_result["fake_fusion"]
fake_text = synth_result["fake_text"]
fake_bg = synth_result["fake_bg"]
cv2.imwrite("fake_fusion.jpg", fake_fusion)
cv2.imwrite("fake_text.jpg", fake_text)
cv2.imwrite("fake_bg.jpg", fake_bg)
def batch_synth_images():
image_synthesiser = ImageSynthesiser()
corpus_file = "../StyleTextRec_data/test_20201208/test_text_list.txt"
style_data_dir = "../StyleTextRec_data/test_20201208/style_images/"
save_path = "./output_data/"
corpus_list = []
with open(corpus_file, "rb") as fin:
lines = fin.readlines()
for line in lines:
substr = line.decode("utf-8").strip("\n").split("\t")
corpus_list.append(substr)
style_img_list = glob.glob("{}/*.jpg".format(style_data_dir))
corpus_num = len(corpus_list)
style_img_num = len(style_img_list)
for cno in range(corpus_num):
for sno in range(style_img_num):
corpus, lang = corpus_list[cno]
style_img_path = style_img_list[sno]
img = cv2.imread(style_img_path)
synth_result = image_synthesiser.synth_image(corpus, img, lang)
fake_fusion = synth_result["fake_fusion"]
fake_text = synth_result["fake_text"]
fake_bg = synth_result["fake_bg"]
for tp in range(2):
if tp == 0:
prefix = "%s/c%d_s%d_" % (save_path, cno, sno)
else:
prefix = "%s/s%d_c%d_" % (save_path, sno, cno)
cv2.imwrite("%s_fake_fusion.jpg" % prefix, fake_fusion)
cv2.imwrite("%s_fake_text.jpg" % prefix, fake_text)
cv2.imwrite("%s_fake_bg.jpg" % prefix, fake_bg)
cv2.imwrite("%s_input_style.jpg" % prefix, img)
print(cno, corpus_num, sno, style_img_num)
if __name__ == "__main__":
# batch_synth_images()
synth_image()

View File

@ -1,218 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import yaml
import os
from argparse import ArgumentParser, RawDescriptionHelpFormatter
def override(dl, ks, v):
"""
Recursively replace dict of list
Args:
dl(dict or list): dict or list to be replaced
ks(list): list of keys
v(str): value to be replaced
"""
def str2num(v):
try:
return eval(v)
except Exception:
return v
assert isinstance(dl, (list, dict)), "{} should be a list or a dict"
assert len(ks) > 0, "lenght of keys should larger than 0"
if isinstance(dl, list):
k = str2num(ks[0])
if len(ks) == 1:
assert k < len(dl), "index({}) out of range({})".format(k, dl)
dl[k] = str2num(v)
else:
override(dl[k], ks[1:], v)
else:
if len(ks) == 1:
# assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
if not ks[0] in dl:
logger.warning("A new filed ({}) detected!".format(ks[0], dl))
dl[ks[0]] = str2num(v)
else:
assert (
ks[0] in dl
), "({}) doesn't exist in {}, a new dict field is invalid".format(ks[0], dl)
override(dl[ks[0]], ks[1:], v)
def override_config(config, options=None):
"""
Recursively override the config
Args:
config(dict): dict to be replaced
options(list): list of pairs(key0.key1.idx.key2=value)
such as: [
'topk=2',
'VALID.transforms.1.ResizeImage.resize_short=300'
]
Returns:
config(dict): replaced config
"""
if options is not None:
for opt in options:
assert isinstance(opt, str), "option({}) should be a str".format(opt)
assert "=" in opt, (
"option({}) should contain a ="
"to distinguish between key and value".format(opt)
)
pair = opt.split("=")
assert len(pair) == 2, "there can be only a = in the option"
key, value = pair
keys = key.split(".")
override(config, keys, value)
return config
class ArgsParser(ArgumentParser):
def __init__(self):
super(ArgsParser, self).__init__(formatter_class=RawDescriptionHelpFormatter)
self.add_argument("-c", "--config", help="configuration file to use")
self.add_argument("-t", "--tag", default="0", help="tag for marking worker")
self.add_argument(
"-o",
"--override",
action="append",
default=[],
help="config options to be overridden",
)
self.add_argument(
"--style_image",
default="examples/style_images/1.jpg",
help="tag for marking worker",
)
self.add_argument(
"--text_corpus", default="PaddleOCR", help="tag for marking worker"
)
self.add_argument("--language", default="en", help="tag for marking worker")
def parse_args(self, argv=None):
args = super(ArgsParser, self).parse_args(argv)
assert args.config is not None, "Please specify --config=configure_file_path."
return args
def load_config(file_path):
"""
Load config from yml/yaml file.
Args:
file_path (str): Path of the config file to be loaded.
Returns: config
"""
ext = os.path.splitext(file_path)[1]
assert ext in [".yml", ".yaml"], "only support yaml files for now"
with open(file_path, "rb") as f:
config = yaml.load(f, Loader=yaml.Loader)
return config
def gen_config():
base_config = {
"Global": {
"algorithm": "SRNet",
"use_gpu": True,
"start_epoch": 1,
"stage1_epoch_num": 100,
"stage2_epoch_num": 100,
"log_smooth_window": 20,
"print_batch_step": 2,
"save_model_dir": "./output/SRNet",
"use_visualdl": False,
"save_epoch_step": 10,
"vgg_pretrain": "./pretrained/VGG19_pretrained",
"vgg_load_static_pretrain": True,
},
"Architecture": {
"model_type": "data_aug",
"algorithm": "SRNet",
"net_g": {
"name": "srnet_net_g",
"encode_dim": 64,
"norm": "batch",
"use_dropout": False,
"init_type": "xavier",
"init_gain": 0.02,
"use_dilation": 1,
},
# input_nc, ndf, netD,
# n_layers_D=3, norm='instance', use_sigmoid=False, init_type='normal', init_gain=0.02, gpu_id='cuda:0'
"bg_discriminator": {
"name": "srnet_bg_discriminator",
"input_nc": 6,
"ndf": 64,
"netD": "basic",
"norm": "none",
"init_type": "xavier",
},
"fusion_discriminator": {
"name": "srnet_fusion_discriminator",
"input_nc": 6,
"ndf": 64,
"netD": "basic",
"norm": "none",
"init_type": "xavier",
},
},
"Loss": {"lamb": 10, "perceptual_lamb": 1, "muvar_lamb": 50, "style_lamb": 500},
"Optimizer": {
"name": "Adam",
"learning_rate": {"name": "lambda", "lr": 0.0002, "lr_decay_iters": 50},
"beta1": 0.5,
"beta2": 0.999,
},
"Train": {
"batch_size_per_card": 8,
"num_workers_per_card": 4,
"dataset": {
"delimiter": "\t",
"data_dir": "/",
"label_file": "tmp/label.txt",
"transforms": [
{
"DecodeImage": {
"to_rgb": True,
"to_np": False,
"channel_first": False,
}
},
{
"NormalizeImage": {
"scale": 1.0 / 255.0,
"mean": [0.485, 0.456, 0.406],
"std": [0.229, 0.224, 0.225],
"order": None,
}
},
{"ToCHWImage": None},
],
},
},
}
with open("config.yml", "w") as f:
yaml.dump(base_config, f)
if __name__ == "__main__":
gen_config()

View File

@ -1,26 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import paddle
__all__ = ["load_dygraph_pretrain"]
def load_dygraph_pretrain(model, logger, path=None, load_static_weights=False):
if not os.path.exists(path + ".pdparams"):
raise ValueError("Model pretrain path {} does not " "exists.".format(path))
param_state_dict = paddle.load(path + ".pdparams")
model.set_state_dict(param_state_dict)
logger.info("load pretrained model from {}".format(path))
return

View File

@ -1,65 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import logging
import functools
import paddle.distributed as dist
logger_initialized = {}
@functools.lru_cache()
def get_logger(name="srnet", log_file=None, log_level=logging.INFO):
"""Initialize and get a logger by name.
If the logger has not been initialized, this method will initialize the
logger by adding one or two handlers, otherwise the initialized logger will
be directly returned. During initialization, a StreamHandler will always be
added. If `log_file` is specified a FileHandler will also be added.
Args:
name (str): Logger name.
log_file (str | None): The log filename. If specified, a FileHandler
will be added to the logger.
log_level (int): The logger level. Note that only the process of
rank 0 is affected, and other processes will set the level to
"Error" thus be silent most of the time.
Returns:
logging.Logger: The expected logger.
"""
logger = logging.getLogger(name)
if name in logger_initialized:
return logger
for logger_name in logger_initialized:
if name.startswith(logger_name):
return logger
formatter = logging.Formatter(
"[%(asctime)s] %(name)s %(levelname)s: %(message)s", datefmt="%Y/%m/%d %H:%M:%S"
)
stream_handler = logging.StreamHandler(stream=sys.stdout)
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
if log_file is not None and dist.get_rank() == 0:
log_file_folder = os.path.split(log_file)[0]
os.makedirs(log_file_folder, exist_ok=True)
file_handler = logging.FileHandler(log_file, "a")
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
if dist.get_rank() == 0:
logger.setLevel(log_level)
else:
logger.setLevel(logging.ERROR)
logger_initialized[name] = True
return logger

View File

@ -1,48 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
def compute_mean_covariance(img):
batch_size = img.shape[0]
channel_num = img.shape[1]
height = img.shape[2]
width = img.shape[3]
num_pixels = height * width
# batch_size * channel_num * 1 * 1
mu = img.mean(2, keepdim=True).mean(3, keepdim=True)
# batch_size * channel_num * num_pixels
img_hat = img - mu.expand_as(img)
img_hat = img_hat.reshape([batch_size, channel_num, num_pixels])
# batch_size * num_pixels * channel_num
img_hat_transpose = img_hat.transpose([0, 2, 1])
# batch_size * channel_num * channel_num
covariance = paddle.bmm(img_hat, img_hat_transpose)
covariance = covariance / num_pixels
return mu, covariance
def dice_coefficient(y_true_cls, y_pred_cls, training_mask):
eps = 1e-5
intersection = paddle.sum(y_true_cls * y_pred_cls * training_mask)
union = (
paddle.sum(y_true_cls * training_mask)
+ paddle.sum(y_pred_cls * training_mask)
+ eps
)
loss = 1.0 - (2 * intersection / union)
return loss

View File

@ -1,74 +0,0 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os
import errno
import paddle
def get_check_global_params(mode):
check_params = [
"use_gpu",
"max_text_length",
"image_shape",
"image_shape",
"character_type",
"loss_type",
]
if mode == "train_eval":
check_params = check_params + [
"train_batch_size_per_card",
"test_batch_size_per_card",
]
elif mode == "test":
check_params = check_params + ["test_batch_size_per_card"]
return check_params
def check_gpu(use_gpu):
"""
Log error and exit when set use_gpu=true in paddlepaddle
cpu version.
"""
err = (
"Config use_gpu cannot be set as true while you are "
"using paddlepaddle cpu version ! \nPlease try: \n"
"\t1. Install paddlepaddle-gpu to run model on GPU \n"
"\t2. Set use_gpu as false in config file to run "
"model on CPU"
)
if use_gpu:
try:
if not paddle.is_compiled_with_cuda():
print(err)
sys.exit(1)
except:
print("Fail to check gpu state.")
sys.exit(1)
def _mkdir_if_not_exist(path, logger):
"""
mkdir if not exists, ignore the exception when multiprocess mkdir together
"""
if not os.path.exists(path):
try:
os.makedirs(path)
except OSError as e:
if e.errno == errno.EEXIST and os.path.isdir(path):
logger.warning(
"be happy if some process has already created {}".format(path)
)
else:
raise OSError("Failed to mkdir {}".format(path))

View File

@ -192,7 +192,7 @@ A可以看下训练的尺度和预测的尺度是否相同如果训练的
#### Q: 如何识别竹简上的古文?
**A**对于字符都是普通的汉字字符的情况只要标注足够的数据finetune模型就可以了。如果数据量不足您可以尝试StyleText工具。
**A**对于字符都是普通的汉字字符的情况只要标注足够的数据finetune模型就可以了。如果数据量不足您可以尝试[StyleText](https://github.com/PFCCLab/StyleText)工具。
而如果使用的字符是特殊的古文字、甲骨文、象形文字等,那么首先需要构建一个古文字的字典,之后再进行训练。
#### Q: 只想要识别票据中的部分片段,重新训练它的话,只需要训练文本检测模型就可以了吗?问文本识别,方向分类还是用原来的模型这样可以吗?
@ -355,6 +355,9 @@ A当训练数据量少时可以尝试以下三种方式获取更多的数
### 2.4 数据标注与生成
> [!NOTE]
> StyleText 已经移动到 [PFCCLab/StyleText](https://github.com/PFCCLab/StyleText)
#### Q: Style-Text 如何不文字风格迁移,就像普通文本生成程序一样默认字体直接输出到分割的背景图?
**A**使用image_synth模式会输出fake_bg.jpg即为背景图。如果想要批量提取背景可以稍微修改一下代码将fake_bg保存下来即可。要修改的位置
@ -371,7 +374,7 @@ StyleText的用途主要是提取style_image中的字体、背景等style信
#### Q: StyleText批量生成图片为什么没有输出
**A**:需要检查以下您配置文件中的路径是否都存在。尤其要注意的是[label_file配置](https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/StyleText/README_ch.md#%E5%BF%AB%E9%80%9F%E4%B8%8A%E6%89%8B)。
**A**:需要检查以下您配置文件中的路径是否都存在。尤其要注意的是[label_file配置](https://github.com/PFCCLab/StyleText/blob/main/README_ch.md#%E4%B8%89%E5%BF%AB%E9%80%9F%E4%B8%8A%E6%89%8B)。
如果您使用的style_image输入没有label信息您依然需要提供一个图片文件列表。
#### Q使用StyleText进行数据合成时文本(TextInput)的长度远超StyleInput的长度该怎么处理与合成呢

View File

@ -111,7 +111,7 @@ PaddleOCR主要聚焦通用OCR如果有垂类需求您可以用PaddleOCR+
a. 人工采集更多的训练数据,最直接也是最有效的方式。
b. 基于PIL和opencv基本图像处理或者变换。例如PIL中ImageFont, Image, ImageDraw三个模块将文字写到背景中opencv的旋转仿射变换高斯滤波等。
c. 利用数据生成算法合成数据例如pix2pix或StyleText等算法。
c. 利用数据生成算法合成数据例如pix2pix或[StyleText](https://github.com/PFCCLab/StyleText)等算法。
<a name="常见问题"></a>

View File

@ -11,7 +11,7 @@
- 2021.8.3 发布PaddleOCR v2.2,新增文档结构分析[PP-Structure](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/ppstructure/README_ch.md)工具包支持版面分析与表格识别含Excel导出
- 2021.6.29 [FAQ](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/FAQ.md)新增5个高频问题总数248个每周一都会更新欢迎大家持续关注。
- 2021.4.8 release 2.1版本新增AAAI 2021论文[端到端识别算法PGNet](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/pgnet.md)开源,[多语言模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.2/doc/doc_ch/multi_languages.md)支持种类增加到80+。
- 2020.12.15 更新数据合成工具[Style-Text](../../StyleText/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
- 2020.12.15 更新数据合成工具[Style-Text](https://github.com/PFCCLab/StyleText/blob/main/README_ch.md),可以批量合成大量与目标场景类似的图像,在多个场景验证,效果明显提升。
- 2020.12.07 [FAQ](../../doc/doc_ch/FAQ.md)新增5个高频问题总数124个并且计划以后每周一都会更新欢迎大家持续关注。
- 2020.11.25 更新半自动标注工具[PPOCRLabel](https://github.com/PFCCLab/PPOCRLabel/blob/main/README_ch.md)辅助开发者高效完成标注任务输出格式与PP-OCR训练任务完美衔接。
- 2020.9.22 更新PP-OCR技术文章https://arxiv.org/abs/2009.09941

View File

@ -11,7 +11,7 @@
- 2021.4.8 release end-to-end text recognition algorithm [PGNet](https://www.aaai.org/AAAI21Papers/AAAI-2885.WangP.pdf) which is published in AAAI 2021. Find tutorial [here](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/pgnet_en.md)release multi language recognition [models](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/multi_languages_en.md), support more than 80 languages recognition; especically, the performance of [English recognition model](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/doc/doc_en/models_list_en.md#English) is Optimized.
- 2021.1.21 update more than 25+ multilingual recognition models [models list](./models_list_en.md), includingEnglish, Chinese, German, French, JapaneseSpanishPortuguese Russia Arabic and so on. Models for more languages will continue to be updated [Develop Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](../../StyleText/README.md)easy to synthesize a large number of images which are similar to the target scene image.
- 2020.12.15 update Data synthesis tool, i.e., [Style-Text](https://github.com/PFCCLab/StyleText/blob/main/README.md)easy to synthesize a large number of images which are similar to the target scene image.
- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](https://github.com/PFCCLab/PPOCRLabel/blob/main/README.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
- 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M, suitable for mobile deployment.

View File

@ -140,7 +140,7 @@ paddleocr --image_dir /your/test/image.jpg --lang=ru
- [Добавьте новые алгоритмы в PaddleOCR](../doc_en/add_new_algorithm_en.md)
- Аннотации и синтез данных
- [Полуавтоматический инструмент аннотации данных: метка ППOCRR](https://github.com/PFCCLab/PPOCRLabel/blob/main/README.md)
- [Инструмент синтеза данных: Стиль-текст](./StyleText/README.md)
- [Инструмент синтеза данных: Стиль-текст](https://github.com/PFCCLab/StyleText/blob/main/README.md)
- [Другие инструменты аннотирования данных](../doc_en/data_annotation_en.md)
- [Другие инструменты синтеза данных](../doc_en/data_synthesis_en.md)
- Наборы данных

View File

@ -137,7 +137,7 @@ paddleocr --image_dir /your/test/image.jpg --lang=hi
- [पैडलओसीआर में नए एल्गोरिदम जोड़ें](../doc_en/add_new_algorithm_en.md)
- डेटा एनोटेशन और सिंथेसिस
- [सेमी-ऑटोमैटिक एनोटेशन टूल: PPओसीआरलेबल](https://github.com/PFCCLab/PPOCRLabel/blob/main/README.md)
- [डेटा सिंथेसिस टूल: स्टाइल-टेक्सट](./StyleText/README.md)
- [डेटा सिंथेसिस टूल: स्टाइल-टेक्सट](https://github.com/PFCCLab/StyleText/blob/main/README.md)
- [अन्य डेटा एनोटेशन टूल](../doc_en/data_annotation_en.md)
- [अन्य डेटा सिंथेसिस टूल](../doc_en/data_synthesis_en.md)
- डेटा सेट

View File

@ -137,7 +137,7 @@ paddleocr --image_dir /your/test/image.jpg --lang=japan # change for i18n abbr
- [PaddleOCR に新しいアルゴリズムを追加する](../doc_en/add_new_algorithm_en.md)
- データの注釈と合成
- [半自動注釈ツール: PPOCRLabel](https://github.com/PFCCLab/PPOCRLabel/blob/main/README.md)
- [データ合成ツール: Style-Text](./StyleText/README.md)
- [データ合成ツール: Style-Text](https://github.com/PFCCLab/StyleText/blob/main/README.md)
- [その他のデータ注釈ツール](../doc_en/data_annotation_en.md)
- [その他のデータ合成ツール](../doc_en/data_synthesis_en.md)
- データセット

View File

@ -135,7 +135,7 @@ paddleocr --image_dir /your/test/image.jpg --lang=korean
- [PaddleOCR에 신규 알고리즘 추가](../doc_en/add_new_algorithm_en.md)
- 데이터 주석 및 합성
- [반-자동 주석 툴: PPOCRLabel](https://github.com/PFCCLab/PPOCRLabel/blob/main/README.md)
- [데이터 합성 툴: 스타일-텍스트](./StyleText/README.md)
- [데이터 합성 툴: 스타일-텍스트](https://github.com/PFCCLab/StyleText/blob/main/README.md)
- [기타 데이터 주석 툴](../doc_en/data_annotation_en.md)
- [기타 데이터 합성 툴](../doc_en/data_synthesis_en.md)
- 데이터세트