Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleClas into benchmark

pull/1245/head
dongshuilong 2021-10-13 09:26:28 +00:00
commit 272bc9481d
152 changed files with 8357 additions and 933 deletions
ppcls
engine

3
.gitignore vendored
View File

@ -3,10 +3,11 @@ __pycache__/
*.sw*
*/workerlog*
checkpoints/
output/
output*/
pretrained/
.ipynb_checkpoints/
*.ipynb*
_build/
build/
log/
nohup.out

View File

@ -7,7 +7,7 @@
飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集助力使用者训练出更好的视觉模型和应用落地。
**近期更新**
- 2021.09.17 增加PaddleClas自研PP-LCNet系列模型, 这些模型在Intel CPU上有较强的竞争力。相关指标和预训练权重可以从 [这里](docs/zh_CN/ImageNet_models.md)下载。
- 2021.08.11 更新7个[FAQ](docs/zh_CN/faq_series/faq_2021_s2.md)。
- 2021.06.29 添加Swin-transformer系列模型ImageNet1k数据集上Top1 acc最高精度可达87.2%支持训练预测评估与whl包部署预训练模型可以从[这里](docs/zh_CN/models/models_intro.md)下载。
- 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课。课程回放[https://aistudio.baidu.com/aistudio/course/introduce/24519](https://aistudio.baidu.com/aistudio/course/introduce/24519)

View File

@ -8,6 +8,8 @@ PaddleClas is an image recognition toolset for industry and academia, helping us
**Recent updates**
- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs. The metrics and pretrained model are available [here](docs/en/ImageNet_models_en.md).
- 2021.06.29 Add Swin-transformer series modelHighest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
- 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
- [more](./docs/en/update_history_en.md)

View File

@ -0,0 +1,33 @@
Global:
infer_imgs: "./images/0517_2715693311.jpg"
inference_model_dir: "../inference/"
batch_size: 1
use_gpu: True
enable_mkldnn: False
cpu_num_threads: 10
enable_benchmark: True
use_fp16: False
ir_optim: True
use_tensorrt: False
gpu_mem: 8000
enable_profile: False
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
channel_num: 3
- ToCHWImage:
PostProcess:
main_indicator: MultiLabelTopk
MultiLabelTopk:
topk: 5
class_id_map_file: None
SavePreLabel:
save_dir: ./pre_label/

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

View File

@ -4,9 +4,9 @@
PaddleClas provides two service deployment methods:
- Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../../deploy/hubserving/readme_en.md)
- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`". Please follow this tutorial.
- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`". if you prefer retrieval_based image reocognition service, please refer to [tutorial](./recognition/README.md)if you'd like image classification service, Please follow this tutorial.
# Service deployment based on PaddleServing
# Image Classification Service deployment based on PaddleServing
This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the ResNet50_vd model as a pipeline online service.
@ -131,7 +131,7 @@ fetch_var {
config.yml # configuration file of starting the service
pipeline_http_client.py # script to send pipeline prediction request by http
pipeline_rpc_client.py # script to send pipeline prediction request by rpc
resnet50_web_service.py # start the script of the pipeline server
classification_web_service.py # start the script of the pipeline server
```
2. Run the following command to start the service.
@ -147,7 +147,7 @@ fetch_var {
python3 pipeline_http_client.py
```
After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
![](./imgs/results.png)
![](./imgs/results.png)
Adjust the number of concurrency in config.yml to get the largest QPS.

View File

@ -4,9 +4,9 @@
PaddleClas提供2种服务部署方式
- 基于PaddleHub Serving的部署代码路径为"`./deploy/hubserving`",使用方法参考[文档](../../deploy/hubserving/readme.md)
- 基于PaddleServing的部署代码路径为"`./deploy/paddleserving`",按照本教程使用。
- 基于PaddleServing的部署代码路径为"`./deploy/paddleserving`" 基于检索方式的图像识别服务参考[文档](./recognition/README_CN.md) 图像分类服务按照本教程使用。
# 基于PaddleServing的服务部署
# 基于PaddleServing的图像分类服务部署
本文档以经典的ResNet50_vd模型为例介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas
动态图模型的pipeline在线服务。
@ -127,7 +127,7 @@ fetch_var {
config.yml # 启动服务的配置文件
pipeline_http_client.py # http方式发送pipeline预测请求的脚本
pipeline_rpc_client.py # rpc方式发送pipeline预测请求的脚本
resnet50_web_service.py # 启动pipeline服务端的脚本
classification_web_service.py # 启动pipeline服务端的脚本
```
2. 启动服务可运行如下命令:

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 123 KiB

View File

@ -0,0 +1,178 @@
# Product Recognition Service deployment based on PaddleServing
(English|[简体中文](./README_CN.md))
This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the product recognition model based on retrieval method as a pipeline online service.
Some Key Features of Paddle Serving:
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed with one line command.
- Industrial serving features supported, such as models management, online loading, online A/B testing etc.
- Highly concurrent and efficient communication between clients and servers supported.
The introduction and tutorial of Paddle Serving service deployment framework reference [document](https://github.com/PaddlePaddle/Serving/blob/develop/README.md).
## Contents
- [Environmental preparation](#environmental-preparation)
- [Model conversion](#model-conversion)
- [Paddle Serving pipeline deployment](#paddle-serving-pipeline-deployment)
- [FAQ](#faq)
<a name="environmental-preparation"></a>
## Environmental preparation
PaddleClas operating environment and PaddleServing operating environment are needed.
1. Please prepare PaddleClas operating environment reference [link](../../docs/zh_CN/tutorials/install.md).
Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.1.0.
2. The steps of PaddleServing operating environment prepare are as follows:
Install serving which used to start the service
```
pip3 install paddle-serving-server==0.6.1 # for CPU
pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
# Other GPU environments need to confirm the environment and then choose to execute the following commands
pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
```
3. Install the client to send requests to the service
In [download link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md) find the client installation package corresponding to the python version.
The python3.7 version is recommended here:
```
wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
```
4. Install serving-app
```
pip3 install paddle-serving-app==0.6.1
```
**note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md).
<a name="model-conversion"></a>
## Model conversion
When using PaddleServing for service deployment, you need to convert the saved inference model into a serving model that is easy to deploy.
The following assumes that the current working directory is the PaddleClas root directory
Firstly, download the inference model of ResNet50_vd
```
cd deploy
# Download and unzip the ResNet50_vd model
wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd models
tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
```
Then, you can use installed paddle_serving_client tool to convert inference model to mobile model.
```
# Product recognition model conversion
python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
--serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
```
After the ResNet50_vd inference model is converted, there will be additional folders of `product_ResNet50_vd_aliproduct_v1.0_serving` and `product_ResNet50_vd_aliproduct_v1.0_client` in the current folder, with the following format:
```
|- product_ResNet50_vd_aliproduct_v1.0_serving/
|- __model__
|- __params__
|- serving_server_conf.prototxt
|- serving_server_conf.stream.prototxt
|- product_ResNet50_vd_aliproduct_v1.0_client
|- serving_client_conf.prototxt
|- serving_client_conf.stream.prototxt
```
Once you have the model file for deployment, you need to change the alias name in `serving_server_conf.prototxt`: change `alias_name` in `fetch_var` to `features`,
The modified serving_server_conf.prototxt file is as follows:
```
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
Nextdownload and unpack the built index of product gallery
```
cd ../
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
```
<a name="paddle-serving-pipeline-deployment"></a>
## Paddle Serving pipeline deployment
**Attention:** pipeline deployment mode does not support Windows platform
1. Download the PaddleClas code, if you have already downloaded it, you can skip this step.
```
git clone https://github.com/PaddlePaddle/PaddleClas
# Enter the working directory
cd PaddleClas/deploy/paddleserving/recognition
```
The paddleserving directory contains the code to start the pipeline service and send prediction requests, including:
```
__init__.py
config.yml # configuration file of starting the service
pipeline_http_client.py # script to send pipeline prediction request by http
pipeline_rpc_client.py # script to send pipeline prediction request by rpc
recognition_web_service.py # start the script of the pipeline server
```
2. Run the following command to start the service.
```
# Start the service and save the running log in log.txt
python3 recognition_web_service.py &>log.txt &
```
After the service is successfully started, a log similar to the following will be printed in log.txt
![](../imgs/start_server_recog.png)
3. Send service request
```
python3 pipeline_http_client.py
```
After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
![](../imgs/results_recog.png)
Adjust the number of concurrency in config.yml to get the largest QPS.
```
op:
concurrency: 8
...
```
Multiple service requests can be sent at the same time if necessary.
The predicted performance data will be automatically written into the `PipelineServingLogs/pipeline.tracer` file.
<a name="faq"></a>
## FAQ
**Q1**: No result return after sending the request.
**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and before sending the request. The command to close the proxy is:
```
unset https_proxy
unset http_proxy
```

View File

@ -0,0 +1,174 @@
# 基于PaddleServing的商品识别服务部署
([English](./README.md)|简体中文)
本文以商品识别为例,介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas动态图模型的pipeline在线服务。
相比较于hubserving部署PaddleServing具备以下优点
- 支持客户端和服务端之间高并发和高效通信
- 支持 工业级的服务能力 例如模型管理在线加载在线A/B测试等
- 支持 多种编程语言 开发客户端例如C++, Python和Java
更多有关PaddleServing服务化部署框架介绍和使用教程参考[文档](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)。
## 目录
- [环境准备](#环境准备)
- [模型转换](#模型转换)
- [Paddle Serving pipeline部署](#部署)
- [FAQ](#FAQ)
<a name="环境准备"></a>
## 环境准备
需要准备PaddleClas的运行环境和PaddleServing的运行环境。
- 准备PaddleClas的[运行环境](../../docs/zh_CN/tutorials/install.md), 根据环境下载对应的paddle whl包推荐安装2.1.0版本
- 准备PaddleServing的运行环境步骤如下
1. 安装serving用于启动服务
```
pip3 install paddle-serving-server==0.6.1 # for CPU
pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
# 其他GPU环境需要确认环境再选择执行如下命令
pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
```
2. 安装client用于向服务发送请求
在[下载链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)中找到对应python版本的client安装包这里推荐python3.7版本:
```
wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
```
3. 安装serving-app
```
pip3 install paddle-serving-app==0.6.1
```
**Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)。
<a name="模型转换"></a>
## 模型转换
使用PaddleServing做服务化部署时需要将保存的inference模型转换为serving易于部署的模型。
以下内容假定当前工作目录为PaddleClas根目录。
首先下载商品识别的inference模型
```
cd deploy
# 下载并解压商品识别模型
wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd models
tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
```
接下来用安装的paddle_serving_client把下载的inference模型转换成易于server部署的模型格式。
```
# 转换商品识别模型
python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
--serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
```
商品识别推理模型转换完成后,会在当前文件夹多出`product_ResNet50_vd_aliproduct_v1.0_serving` 和`product_ResNet50_vd_aliproduct_v1.0_client`的文件夹,具备如下格式:
```
|- product_ResNet50_vd_aliproduct_v1.0_serving/
|- __model__
|- __params__
|- serving_server_conf.prototxt
|- serving_server_conf.stream.prototxt
|- product_ResNet50_vd_aliproduct_v1.0_client
|- serving_client_conf.prototxt
|- serving_client_conf.stream.prototxt
```
得到模型文件之后需要修改serving_server_conf.prototxt中的alias名字 将`fetch_var`中的`alias_name`改为`features`,
修改后的serving_server_conf.prototxt内容如下
```
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
接下来下载并解压已经构建后的商品库index
```
cd ../
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
```
<a name="部署"></a>
## Paddle Serving pipeline部署
**注意:** pipeline部署方式不支持windows平台
1. 下载PaddleClas代码若已下载可跳过此步骤
```
git clone https://github.com/PaddlePaddle/PaddleClas
# 进入到工作目录
cd PaddleClas/deploy/paddleserving/recognition
```
paddleserving目录包含启动pipeline服务和发送预测请求的代码包括
```
__init__.py
config.yml # 启动服务的配置文件
pipeline_http_client.py # http方式发送pipeline预测请求的脚本
pipeline_rpc_client.py # rpc方式发送pipeline预测请求的脚本
recognition_web_service.py # 启动pipeline服务端的脚本
```
2. 启动服务可运行如下命令:
```
# 启动服务运行日志保存在log.txt
python3 recognition_web_service.py &>log.txt &
```
成功启动服务后log.txt中会打印类似如下日志
![](../imgs/start_server_recog.png)
3. 发送服务请求:
```
python3 pipeline_http_client.py
```
成功运行后模型预测的结果会打印在cmd窗口中结果示例为
![](../imgs/results_recog.png)
调整 config.yml 中的并发个数可以获得最大的QPS
```
op:
#并发数is_thread_op=True时为线程并发否则为进程并发
concurrency: 8
...
```
有需要的话可以同时发送多个服务请求
预测性能数据会被自动写入 `PipelineServingLogs/pipeline.tracer` 文件中。
<a name="FAQ"></a>
## FAQ
**Q1** 发送请求后没有结果返回或者提示输出解码报错
**A1** 启动服务和发送请求时不要设置代理,可以在启动服务前和发送请求前关闭代理,关闭代理的命令是:
```
unset https_proxy
unset http_proxy
```

View File

@ -0,0 +1,43 @@
#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程每个进程内构建grpcSever和DAG
##当build_dag_each_worker=False时框架会设置主线程grpc线程池的max_workers=worker_num
worker_num: 1
#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时不自动生成http_port
http_port: 18081
rpc_port: 9994
dag:
#op资源类型, True, 为线程模型False为进程模型
is_thread_op: False
op:
rec:
#并发数is_thread_op=True时为线程并发否则为进程并发
concurrency: 1
#当op配置没有server_endpoints时从local_service_conf读取本地服务配置
local_service_conf:
#uci模型路径
model_config: ../../models/product_ResNet50_vd_aliproduct_v1.0_serving
#计算硬件类型: 空缺时由devices决定(CPU/GPU)0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
device_type: 1
#计算硬件ID当devices为""或不写时为CPU预测当devices为"0", "0,1,2"时为GPU预测表示使用的GPU卡
devices: "0" # "0,1"
#client类型包括brpc, grpc和local_predictor.local_predictor不启动Serving服务进程内预测
client_type: local_predictor
#Fetch结果列表以client_config中fetch_var的alias_name为准
fetch_list: ["features"]
det:
concurrency: 1
local_service_conf:
client_type: local_predictor
device_type: 1
devices: '0'
fetch_list:
- save_infer_model/scale_0.tmp_1
model_config: ../../models/ppyolov2_r50vd_dcn_mainbody_v1.0_serving/

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

View File

@ -0,0 +1,2 @@
foreground
background

View File

@ -0,0 +1,21 @@
import requests
import json
import base64
import os
imgpath = "daoxiangcunjinzhubing_6.jpg"
def cv2_to_base64(image):
return base64.b64encode(image).decode('utf8')
if __name__ == "__main__":
url = "http://127.0.0.1:18081/recognition/prediction"
with open(os.path.join(".", imgpath), 'rb') as file:
image_data1 = file.read()
image = cv2_to_base64(image_data1)
data = {"key": ["image"], "value": [image]}
for i in range(1):
r = requests.post(url=url, data=json.dumps(data))
print(r.json())

View File

@ -0,0 +1,34 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
try:
from paddle_serving_server_gpu.pipeline import PipelineClient
except ImportError:
from paddle_serving_server.pipeline import PipelineClient
import base64
client = PipelineClient()
client.connect(['127.0.0.1:9994'])
imgpath = "daoxiangcunjinzhubing_6.jpg"
def cv2_to_base64(image):
return base64.b64encode(image).decode('utf8')
if __name__ == "__main__":
with open(imgpath, 'rb') as file:
image_data = file.read()
image = cv2_to_base64(image_data)
for i in range(1):
ret = client.predict(feed_dict={"image": image}, fetch=["result"])
print(ret)

View File

@ -0,0 +1,198 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_server.web_service import WebService, Op
import logging
import numpy as np
import sys
import cv2
from paddle_serving_app.reader import *
import base64
import os
import faiss
import pickle
import json
class DetOp(Op):
def init_op(self):
self.img_preprocess = Sequential([
BGR2RGB(), Div(255.0),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
Resize((640, 640)), Transpose((2, 0, 1))
])
self.img_postprocess = RCNNPostprocess("label_list.txt", "output")
self.threshold = 0.2
self.max_det_results = 5
def generate_scale(self, im):
"""
Args:
im (np.ndarray): image (np.ndarray)
Returns:
im_scale_x: the resize ratio of X
im_scale_y: the resize ratio of Y
"""
target_size = [640, 640]
origin_shape = im.shape[:2]
resize_h, resize_w = target_size
im_scale_y = resize_h / float(origin_shape[0])
im_scale_x = resize_w / float(origin_shape[1])
return im_scale_y, im_scale_x
def preprocess(self, input_dicts, data_id, log_id):
(_, input_dict), = input_dicts.items()
imgs = []
raw_imgs = []
for key in input_dict.keys():
data = base64.b64decode(input_dict[key].encode('utf8'))
raw_imgs.append(data)
data = np.fromstring(data, np.uint8)
raw_im = cv2.imdecode(data, cv2.IMREAD_COLOR)
im_scale_y, im_scale_x = self.generate_scale(raw_im)
im = self.img_preprocess(raw_im)
imgs.append({
"image": im[np.newaxis, :],
"im_shape": np.array(list(im.shape[1:])).reshape(-1)[np.newaxis,:],
"scale_factor": np.array([im_scale_y, im_scale_x]).astype('float32'),
})
self.raw_img = raw_imgs
feed_dict = {
"image": np.concatenate([x["image"] for x in imgs], axis=0),
"im_shape": np.concatenate([x["im_shape"] for x in imgs], axis=0),
"scale_factor": np.concatenate([x["scale_factor"] for x in imgs], axis=0)
}
return feed_dict, False, None, ""
def postprocess(self, input_dicts, fetch_dict, log_id):
boxes = self.img_postprocess(fetch_dict, visualize=False)
boxes.sort(key = lambda x: x["score"], reverse = True)
boxes = filter(lambda x: x["score"] >= self.threshold, boxes[:self.max_det_results])
boxes = list(boxes)
for i in range(len(boxes)):
boxes[i]["bbox"][2] += boxes[i]["bbox"][0] - 1
boxes[i]["bbox"][3] += boxes[i]["bbox"][1] - 1
result = json.dumps(boxes)
res_dict = {"bbox_result": result, "image": self.raw_img}
return res_dict, None, ""
class RecOp(Op):
def init_op(self):
self.seq = Sequential([
BGR2RGB(), Resize((224, 224)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
False), Transpose((2, 0, 1))
])
index_dir = "../../recognition_demo_data_v1.1/gallery_product/index"
assert os.path.exists(os.path.join(
index_dir, "vector.index")), "vector.index not found ..."
assert os.path.exists(os.path.join(
index_dir, "id_map.pkl")), "id_map.pkl not found ... "
self.searcher = faiss.read_index(
os.path.join(index_dir, "vector.index"))
with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
self.id_map = pickle.load(fd)
self.rec_nms_thresold = 0.05
self.rec_score_thres = 0.5
self.feature_normalize = True
self.return_k = 1
def preprocess(self, input_dicts, data_id, log_id):
(_, input_dict), = input_dicts.items()
raw_img = input_dict["image"][0]
data = np.frombuffer(raw_img, np.uint8)
origin_img = cv2.imdecode(data, cv2.IMREAD_COLOR)
dt_boxes = input_dict["bbox_result"]
boxes = json.loads(dt_boxes)
boxes.append({"category_id": 0,
"score": 1.0,
"bbox": [0, 0, origin_img.shape[1], origin_img.shape[0]]
})
self.det_boxes = boxes
#construct batch images for rec
imgs = []
for box in boxes:
box = [int(x) for x in box["bbox"]]
im = origin_img[box[1]: box[3], box[0]: box[2]].copy()
img = self.seq(im)
imgs.append(img[np.newaxis, :].copy())
input_imgs = np.concatenate(imgs, axis=0)
return {"x": input_imgs}, False, None, ""
def nms_to_rec_results(self, results, thresh = 0.1):
filtered_results = []
x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
scores = np.array([r["rec_scores"] for r in results])
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
while order.size > 0:
i = order[0]
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
filtered_results.append(results[i])
return filtered_results
def postprocess(self, input_dicts, fetch_dict, log_id):
batch_features = fetch_dict["features"]
if self.feature_normalize:
feas_norm = np.sqrt(
np.sum(np.square(batch_features), axis=1, keepdims=True))
batch_features = np.divide(batch_features, feas_norm)
scores, docs = self.searcher.search(batch_features, self.return_k)
results = []
for i in range(scores.shape[0]):
pred = {}
if scores[i][0] >= self.rec_score_thres:
pred["bbox"] = [int(x) for x in self.det_boxes[i]["bbox"]]
pred["rec_docs"] = self.id_map[docs[i][0]].split()[1]
pred["rec_scores"] = scores[i][0]
results.append(pred)
#do nms
results = self.nms_to_rec_results(results, self.rec_nms_thresold)
return {"result": str(results)}, None, ""
class RecognitionService(WebService):
def get_pipeline_response(self, read_op):
det_op = DetOp(name="det", input_ops=[read_op])
rec_op = RecOp(name="rec", input_ops=[det_op])
return rec_op
product_recog_service = RecognitionService(name="recognition")
product_recog_service.prepare_pipeline_config("config.yml")
product_recog_service.run_service()

View File

@ -81,12 +81,14 @@ class Topk(object):
class_id_map = None
return class_id_map
def __call__(self, x, file_names=None):
def __call__(self, x, file_names=None, multilabel=False):
if file_names is not None:
assert x.shape[0] == len(file_names)
y = []
for idx, probs in enumerate(x):
index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32")
index = probs.argsort(axis=0)[-self.topk:][::-1].astype(
"int32") if not multilabel else np.where(
probs >= 0.5)[0].astype("int32")
clas_id_list = []
score_list = []
label_name_list = []
@ -108,6 +110,14 @@ class Topk(object):
return y
class MultiLabelTopk(Topk):
def __init__(self, topk=1, class_id_map_file=None):
super().__init__()
def __call__(self, x, file_names=None):
return super().__call__(x, file_names, multilabel=True)
class SavePreLabel(object):
def __init__(self, save_dir):
if save_dir is None:
@ -128,23 +138,24 @@ class SavePreLabel(object):
os.makedirs(output_dir, exist_ok=True)
shutil.copy(image_file, output_dir)
class Binarize(object):
def __init__(self, method = "round"):
def __init__(self, method="round"):
self.method = method
self.unit = np.array([[128, 64, 32, 16, 8, 4, 2, 1]]).T
def __call__(self, x, file_names=None):
if self.method == "round":
x = np.round(x + 1).astype("uint8") - 1
if self.method == "sign":
x = ((np.sign(x) + 1) / 2).astype("uint8")
embedding_size = x.shape[1]
assert embedding_size % 8 == 0, "The Binary index only support vectors with sizes multiple of 8"
byte = np.zeros([x.shape[0], embedding_size // 8], dtype=np.uint8)
for i in range(embedding_size // 8):
byte[:, i:i+1] = np.dot(x[:, i * 8: (i + 1)* 8], self.unit)
byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit)
return byte

View File

@ -71,7 +71,6 @@ class ClsPredictor(Predictor):
output_names = self.paddle_predictor.get_output_names()
output_tensor = self.paddle_predictor.get_output_handle(output_names[
0])
if self.benchmark:
self.auto_logger.times.start()
if not isinstance(images, (list, )):
@ -119,7 +118,6 @@ def main(config):
) == len(image_list):
if len(batch_imgs) == 0:
continue
batch_results = cls_predictor.predict(batch_imgs)
for number, result_dict in enumerate(batch_results):
filename = batch_names[number]

View File

@ -19,12 +19,14 @@ from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from functools import partial
import six
import math
import random
import cv2
import numpy as np
import importlib
from PIL import Image
from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize
@ -50,6 +52,50 @@ def create_operators(params):
return ops
class UnifiedResize(object):
def __init__(self, interpolation=None, backend="cv2"):
_cv2_interp_from_str = {
'nearest': cv2.INTER_NEAREST,
'bilinear': cv2.INTER_LINEAR,
'area': cv2.INTER_AREA,
'bicubic': cv2.INTER_CUBIC,
'lanczos': cv2.INTER_LANCZOS4
}
_pil_interp_from_str = {
'nearest': Image.NEAREST,
'bilinear': Image.BILINEAR,
'bicubic': Image.BICUBIC,
'box': Image.BOX,
'lanczos': Image.LANCZOS,
'hamming': Image.HAMMING
}
def _pil_resize(src, size, resample):
pil_img = Image.fromarray(src)
pil_img = pil_img.resize(size, resample)
return np.asarray(pil_img)
if backend.lower() == "cv2":
if isinstance(interpolation, str):
interpolation = _cv2_interp_from_str[interpolation.lower()]
# compatible with opencv < version 4.4.0
elif interpolation is None:
interpolation = cv2.INTER_LINEAR
self.resize_func = partial(cv2.resize, interpolation=interpolation)
elif backend.lower() == "pil":
if isinstance(interpolation, str):
interpolation = _pil_interp_from_str[interpolation.lower()]
self.resize_func = partial(_pil_resize, resample=interpolation)
else:
logger.warning(
f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
)
self.resize_func = cv2.resize
def __call__(self, src, size):
return self.resize_func(src, size)
class OperatorParamError(ValueError):
""" OperatorParamError
"""
@ -87,8 +133,11 @@ class DecodeImage(object):
class ResizeImage(object):
""" resize image """
def __init__(self, size=None, resize_short=None, interpolation=-1):
self.interpolation = interpolation if interpolation >= 0 else None
def __init__(self,
size=None,
resize_short=None,
interpolation=None,
backend="cv2"):
if resize_short is not None and resize_short > 0:
self.resize_short = resize_short
self.w = None
@ -101,6 +150,9 @@ class ResizeImage(object):
raise OperatorParamError("invalid params for ReisizeImage for '\
'both 'size' and 'resize_short' are None")
self._resize_func = UnifiedResize(
interpolation=interpolation, backend=backend)
def __call__(self, img):
img_h, img_w = img.shape[:2]
if self.resize_short is not None:
@ -110,10 +162,7 @@ class ResizeImage(object):
else:
w = self.w
h = self.h
if self.interpolation is None:
return cv2.resize(img, (w, h))
else:
return cv2.resize(img, (w, h), interpolation=self.interpolation)
return self._resize_func(img, (w, h))
class CropImage(object):
@ -145,9 +194,12 @@ class CropImage(object):
class RandCropImage(object):
""" random crop image """
def __init__(self, size, scale=None, ratio=None, interpolation=-1):
self.interpolation = interpolation if interpolation >= 0 else None
def __init__(self,
size,
scale=None,
ratio=None,
interpolation=None,
backend="cv2"):
if type(size) is int:
self.size = (size, size) # (h, w)
else:
@ -156,6 +208,9 @@ class RandCropImage(object):
self.scale = [0.08, 1.0] if scale is None else scale
self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
self._resize_func = UnifiedResize(
interpolation=interpolation, backend=backend)
def __call__(self, img):
size = self.size
scale = self.scale
@ -181,10 +236,8 @@ class RandCropImage(object):
j = random.randint(0, img_h - h)
img = img[j:j + h, i:i + w, :]
if self.interpolation is None:
return cv2.resize(img, size)
else:
return cv2.resize(img, size, interpolation=self.interpolation)
return self._resize_func(img, size)
class RandFlipImage(object):

View File

@ -1,6 +1,9 @@
# classification
python3.7 python/predict_cls.py -c configs/inference_cls.yaml
# multilabel_classification
#python3.7 python/predict_cls.py -c configs/inference_multilabel_cls.yaml
# feature extractor
# python3.7 python/predict_rec.py -c configs/inference_rec.yaml

View File

@ -24,13 +24,13 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a
* Server-side distillation pretrained models
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|
| ResNet34_vd_ssld | 0.797 | 0.760 | 0.037 | 2.434 | 6.222 | 7.39 | 21.82 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams) |
| ResNet50_vd_<br>ssld | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
| ResNet101_vd_<br>ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams) |
| Res2Net50_vd_<br>26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net101_vd_<br>26w_4s_ssld | 0.839 | 0.806 | 0.033 | 8.087 | 17.312 | 16.67 | 45.22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net200_vd_<br>26w_4s_ssld | 0.851 | 0.812 | 0.049 | 14.678 | 32.350 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
| ResNet50_vd_ssld | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
| ResNet101_vd_ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams) |
| Res2Net50_vd_26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net101_vd_26w_4s_ssld | 0.839 | 0.806 | 0.033 | 8.087 | 17.312 | 16.67 | 45.22 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net200_vd_26w_4s_ssld | 0.851 | 0.812 | 0.049 | 14.678 | 32.350 | 31.49 | 76.21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
| HRNet_W18_C_ssld | 0.812 | 0.769 | 0.043 | 7.406 | 13.297 | 4.14 | 21.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
| HRNet_W48_C_ssld | 0.836 | 0.790 | 0.046 | 13.707 | 34.435 | 34.58 | 77.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) |
| SE_HRNet_W64_C_ssld | 0.848 | - | - | 31.697 | 94.995 | 57.83 | 128.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
@ -38,19 +38,44 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a
* Mobile-side distillation pretrained models
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | Download Address |
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Storage Size(M) | Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
| MobileNetV1_<br>ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams) |
| MobileNetV2_<br>ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
| MobileNetV3_small_<br>x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
| GhostNet_<br>x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)
| MobileNetV1_ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams) |
| MobileNetV2_ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
| MobileNetV3_small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
| MobileNetV3_large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
| MobileNetV3_small_x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
| GhostNet_x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)
* Intel-CPU-side distillation pretrained models
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | Flops(M) | Params(M) | Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
| PPLCNet_x0_5_ssld | 0.661 | 0.631 | 0.030 | 2.05 | 47 | 1.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams) |
| PPLCNet_x1_0_ssld | 0.744 | 0.713 | 0.033 | 2.46 | 161 | 3.0 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams) |
| PPLCNet_x2_5_ssld | 0.808 | 0.766 | 0.042 | 5.39 | 906 | 9.0 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams) |
* Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.
<a name="PPLCNet_series"></a>
### PPLCNet_series
Accuracy and inference time metrics of PPLCNet series models are shown as follows. More detailed information can be refered to [PPLCNet series tutorial](../en/models/PPLCNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | Download Address |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| PPLCNet_x0_25 |0.5186 | 0.7565 | 1.74 | 18 | 1.5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) |
| PPLCNet_x0_35 |0.5809 | 0.8083 | 1.92 | 29 | 1.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) |
| PPLCNet_x0_5 |0.6314 | 0.8466 | 2.05 | 47 | 1.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) |
| PPLCNet_x0_75 |0.6818 | 0.8830 | 2.29 | 99 | 2.4 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) |
| PPLCNet_x1_0 |0.7132 | 0.9003 | 2.46 | 161 | 3.0 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) |
| PPLCNet_x1_5 |0.7371 | 0.9153 | 3.19 | 342 | 4.5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) |
| PPLCNet_x2_0 |0.7518 | 0.9227 | 4.27 | 590 | 6.5 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) |
| PPLCNet_x2_5 |0.7660 | 0.9300 | 5.39 | 906 | 9.0 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) |
<a name="ResNet_and_Vd_series"></a>
### ResNet and Vd series

View File

@ -25,58 +25,68 @@ tar -xf NUS-SCENE-dataset.tar
cd ../../
```
## Environment
### Download pretrained model
You can use the following commands to download the pretrained model of ResNet50_vd.
```bash
mkdir pretrained
cd pretrained
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
cd ../
```
## Training
```shell
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch \
--gpus="0" \
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/ResNet50_vd_multilabel.yaml
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
```
After training for 10 epochs, the best accuracy over the validation set should be around 0.72.
After training for 10 epochs, the best accuracy over the validation set should be around 0.95.
## Evaluation
```bash
python tools/eval.py \
-c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
-o pretrained_model="./output/ResNet50_vd/best_model/ppcls" \
-o load_static_weights=False
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
The metric of evaluation is based on mAP, which is commonly used in multilabel task to show model perfermance. The mAP over validation set should be around 0.57.
## Prediction
```bash
python tools/infer/infer.py \
-i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
--model ResNet50_vd \
--pretrained_model "./output/ResNet50_vd/best_model/ppcls" \
--use_gpu True \
--load_static_weights False \
--multilabel True \
--class_num 33
python3 tools/infer.py
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
You will get multiple output such as the following:
```
class id: 3, probability: 0.6025
class id: 23, probability: 0.5491
class id: 32, probability: 0.7006
```
```
[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]
```
## Prediction based on prediction engine
### Export model
```bash
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
The default path of the inference model is under the current path `./inference`
### Prediction based on prediction engine
Enter the deploy directory:
```bash
cd ./deploy
```
Prediction based on prediction engine:
```
python3 python/predict_cls.py \
-c configs/inference_multilabel_cls.yaml
```
You will get multiple output such as the following:
```
0517_2715693311.jpg: class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
```

View File

@ -22,9 +22,8 @@ The feature learning config file description can be found in [yaml description](
The following are the pretrained models trained on different dataset.
- Vehicle Fine-Grained Classification[CompCars](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.1_pretrained.pdparams)
- Vehicle ReID[VERI-Wild](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.0_pretrained.pdparams)
- Vehicle Fine-Grained Classification[CompCars](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.2_pretrained.pdparams)
- Vehicle ReID[VERI-Wild](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.1_pretrained.pdparams)
- Cartoon Character Recognition[iCartoon](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/cartoon_rec_ResNet50_iCartoon_v1.0_pretrained.pdparams)
- Logo Recognition[Logo 3K](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/logo_rec_ResNet50_Logo3K_v1.0_pretrained.pdparams)
- Product Recognition [Inshop](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Inshop_pretrained_v1.0.pdparams)、[Aliproduct](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams)
- Logo Recognition[Logo 3K](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/logo_rec_ResNet50_Logo3K_v1.1_pretrained.pdparams)
- Product Recognition [Inshop](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Inshop_pretrained_v1.1.pdparams)、[Aliproduct](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams)

View File

@ -0,0 +1,41 @@
# PPLCNet series
## Overview
The PPLCNet series is a network that has excellent performance on Intel-CPU proposed by the Baidu PaddleCV team. The author summarizes some methods that can improve the accuracy of the model on Intel-CPU but hardly increase the inference time. The author combines these methods into a new network, namely PPLCNet. Compared with other lightweight networks, PPLCNet can achieve higher accuracy with the same inference time. PPLCNet has shown strong competitiveness in image classification, object detection, and semantic segmentation.
## Accuracy, FLOPS and Parameters
| Models | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|
| PPLCNet_x0_25 |0.5186 | 0.7565 | 18 | 1.5 |
| PPLCNet_x0_35 |0.5809 | 0.8083 | 29 | 1.6 |
| PPLCNet_x0_5 |0.6314 | 0.8466 | 47 | 1.9 |
| PPLCNet_x0_75 |0.6818 | 0.8830 | 99 | 2.4 |
| PPLCNet_x1_0 |0.7132 | 0.9003 | 161 | 3.0 |
| PPLCNet_x1_5 |0.7371 | 0.9153 | 342 | 4.5 |
| PPLCNet_x2_0 |0.7518 | 0.9227 | 590 | 6.5 |
| PPLCNet_x2_5 |0.7660 | 0.9300 | 906 | 9.0 |
| PPLCNet_x0_5_ssld |0.6610 | 0.8646 | 47 | 1.9 |
| PPLCNet_x1_0_ssld |0.7439 | 0.9209 | 161 | 3.0 |
| PPLCNet_x2_5_ssld |0.8082 | 0.9533 | 906 | 9.0 |
## Inference speed based on Intel(R)-Xeon(R)-Gold-6148-CPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|------------------|-----------|-------------------|--------------------------|
| PPLCNet_x0_25 | 224 | 256 | 1.74 |
| PPLCNet_x0_35 | 224 | 256 | 1.92 |
| PPLCNet_x0_5 | 224 | 256 | 2.05 |
| PPLCNet_x0_75 | 224 | 256 | 2.29 |
| PPLCNet_x1_0 | 224 | 256 | 2.46 |
| PPLCNet_x1_5 | 224 | 256 | 3.19 |
| PPLCNet_x2_0 | 224 | 256 | 4.27 |
| PPLCNet_x2_5 | 224 | 256 | 5.39 |
| PPLCNet_x0_5_ssld | 224 | 256 | 2.05 |
| PPLCNet_x1_0_ssld | 224 | 256 | 2.46 |
| PPLCNet_x2_5_ssld | 224 | 256 | 5.39 |

View File

@ -14,13 +14,13 @@ After preparing the configuration file, The training process can be started in t
```
python tools/train.py \
-c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="" \
-o use_gpu=False
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=False \
-o Global.device=gpu
```
Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o pretrained_model=""` means to not using pre-trained models.
`-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`.
Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o Arch.pretrained=False` means to not using pre-trained models.
`-o Global.device=gpu` means to use GPU for training. If you want to use the CPU for training, you need to set `Global.device` to `cpu`.
Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_description_en.md).
@ -54,12 +54,12 @@ After configuring the configuration file, you can finetune it by loading the pre
```
python tools/train.py \
-c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained" \
-o use_gpu=True
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=True \
-o Global.device=gpu
```
Among them, `-o pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
Among them, `-o Arch.pretrained` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. You can also set it into `True` to use pretrained weights that trained in ImageNet1k.
We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](../models/models_intro_en.md).
@ -69,28 +69,26 @@ If the training process is terminated for some reasons, you can also load the ch
```
python tools/train.py \
-c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-o last_epoch=5 \
-o use_gpu=True
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-o Global.device=gpu
```
The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
**Note**:
* The parameter `-o last_epoch=5` means to record the number of the last training epoch as `5`, that is, the number of this training epoch starts from `6`, , and the parameter defaults to `-1`, which means the number of this training epoch starts from `0`.
* The `-o checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `checkpoints` to `./output/MobileNetV3_large_x1_0_gpupaddle/5/ppcls`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
* The `-o Global.checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `Global.checkpoints` to `../output/MobileNetV3_large_x1_0/epoch_5`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
```shell
output/
── MobileNetV3_large_x1_0
├── 0
│ ├── ppcls.pdopt
│ └── ppcls.pdparams
├── 1
│ ├── ppcls.pdopt
│ └── ppcls.pdparams
output
── MobileNetV3_large_x1_0
│ ├── best_model.pdopt
│ ├── best_model.pdparams
│ ├── best_model.pdstates
│ ├── epoch_1.pdopt
│ ├── epoch_1.pdparams
│ ├── epoch_1.pdstates
.
.
.
@ -103,18 +101,15 @@ The model evaluation process can be started as follows.
```bash
python tools/eval.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-o load_static_weights=False
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
The above command will use `./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model/ppcls`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
The above command will use `./configs/quick_start/MobileNetV3_large_x1_0.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
Some of the configurable evaluation parameters are described as follows:
* `ARCHITECTURE.name`: Model name
* `pretrained_model`: The path of the model file to be evaluated
* `load_static_weights`: Whether the model to be evaluated is a static graph model
* `Arch.name`: Model name
* `Global.pretrained_model`: The path of the model file to be evaluated
**Note:** If the model is a dygraph type, you only need to specify the prefix of the model file when loading the model, instead of specifying the suffix, such as [1.3 Resume Training](#13-resume-training).
@ -125,26 +120,15 @@ If you want to run PaddleClas on Linux with GPU, it is highly recommended to use
### 2.1 Model training
After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `selected_gpus`:
After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `gpus`:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
```
The configuration can be updated by adding the `-o` parameter.
```bash
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="" \
-o use_gpu=True
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml
```
The format of output log information is the same as above, see [1.1 Model training](#11-model-training) for details.
@ -156,14 +140,14 @@ After configuring the configuration file, you can finetune it by loading the pre
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained"
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=True
```
Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
Among them, `Arch.pretrained` is set to `True` or `False`. It also can be used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
There contains a lot of examples of model finetuning in [Quick Start](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset.
@ -175,26 +159,26 @@ If the training process is terminated for some reasons, you can also load the ch
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-o last_epoch=5 \
-o use_gpu=True
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-o Global.device=gpu
```
The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. About `last_epoch` parameter, please refer [1.3 Resume training](#13-resume-training) for details.
The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter as described in [1.3 Resume training](#13-resume-training).
### 2.4 Model evaluation
The model evaluation process can be started as follows.
```bash
python tools/eval.py \
-c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-o load_static_weights=False
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
About parameter description, see [1.4 Model evaluation](#14-model-evaluation) for details.
@ -204,30 +188,16 @@ About parameter description, see [1.4 Model evaluation](#14-model-evaluation) fo
After the training is completed, you can predict by using the pre-trained model obtained by the training, as follows:
```python
python tools/infer/infer.py \
-i image path \
--model MobileNetV3_large_x1_0 \
--pretrained_model "./output/MobileNetV3_large_x1_0/best_model/ppcls" \
--use_gpu True \
--load_static_weights False
python3 tools/infer.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
Among them:
+ `image_file`(i): The path of the image file to be predicted, such as `./test.jpeg`;
+ `model`: Model name, such as `MobileNetV3_large_x1_0`;
+ `pretrained_model`: Weight file path, such as `./pretrained/MobileNetV3_large_x1_0_pretrained/`;
+ `use_gpu`: Whether to use the GPU, default by `True`;
+ `load_static_weights`: Whether to load the pre-trained model obtained from static image training, default by `False`;
+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
+ `pre_label_image`: Whether to pre-label the image data, default value: `False`;
+ `pre_label_out_idr`: The output path of pre-labeled image data. When `pre_label_image=True`, a lot of subfolders will be generated under the path, each subfolder represent a category, which stores all the images predicted by the model to belong to the category.
+ `Infer.infer_imgs`: The path of the image file or folder to be predicted;
+ `Global.pretrained_model`: Weight file path, such as `./output/MobileNetV3_large_x1_0/best_model`;
**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
About more detailed infomation, you can refer to [infer.py](../../../tools/infer/infer.py).
<a name="model_inference"></a>
## 4. Use the inference model to predict
PaddlePaddle supports inference using prediction engines, which will be introduced next.
@ -235,41 +205,38 @@ PaddlePaddle supports inference using prediction engines, which will be introduc
Firstly, you should export inference model using `tools/export_model.py`.
```bash
python tools/export_model.py \
--model MobileNetV3_large_x1_0 \
--pretrained_model ./output/MobileNetV3_large_x1_0/best_model/ppcls \
--output_path ./inference \
--class_dim 1000
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=output/MobileNetV3_large_x1_0/best_model
```
Among them, the `--model` parameter is used to specify the model name, `--pretrained_model` parameter is used to specify the model file path, the path does not need to include the model file suffix name, and `--output_path` is used to specify the storage path of the converted model, class_dim means number of class for the model, default as 1000.
**Note**:
1. If `--output_path=./inference`, then three files will be generated in the folder `inference`, they are `inference.pdiparams`, `inference.pdmodel` and `inference.pdiparams.info`.
2. You can specify the `shape` of the model input image by setting the parameter `--img_size`, the default is `224`, which means the shape of input image is `224*224`. If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, you need to set `--img_size=384`.
Among them, `Global.pretrained_model` parameter is used to specify the model file path that does not need to include the file suffix name.
The above command will generate the model structure file (`inference.pdmodel`) and the model weight file (`inference.pdiparams`), and then the inference engine can be used for inference:
Go to the deploy directory:
```
cd deploy
```
Using inference engine to inference. Because the mapping file of ImageNet1k dataset is used by default, we should set `PostProcess.Topk.class_id_map_file` into `None`.
```bash
python tools/infer/predict.py \
--image_file image path \
--model_file "./inference/inference.pdmodel" \
--params_file "./inference/inference.pdiparams" \
--use_gpu=True \
--use_tensorrt=False
python3 python/predict_cls.py \
-c configs/inference_cls.yaml \
-o Global.infer_imgs=../dataset/flowers102/jpg/image_00001.jpg \
-o Global.inference_model_dir=../inference/ \
-o PostProcess.Topk.class_id_map_file=None
```
Among them:
+ `image_file`: The path of the image file to be predicted, such as `./test.jpeg`;
+ `model_file`: Model file path, such as `./MobileNetV3_large_x1_0/inference.pdmodel`;
+ `params_file`: Weight file path, such as `./MobileNetV3_large_x1_0/inference.pdiparams`;
+ `use_tensorrt`: Whether to use the TesorRT, default by `True`;
+ `use_gpu`: Whether to use the GPU, default by `True`
+ `enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. When both `use_gpu` and `enable_mkldnn` are set to `True`, GPU is used to run and `enable_mkldnn` will be ignored.
+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
+ `enable_calc_topk`: Whether to calculate top-k accuracy of the predction, default by `False`. Top-k accuracy will be printed out when set as `True`.
+ `gt_label_path`: Image name and label file, used when `enable_calc_topk` is `True` to get image list and labels.
+ `Global.infer_imgs`: The path of the image file to be predicted;
+ `Global.inference_model_dir`: Model structure file path, such as `../inference/inference.pdmodel`;
+ `Global.use_tensorrt`: Whether to use the TesorRT, default by `False`;
+ `Global.use_gpu`: Whether to use the GPU, default by `True`
+ `Global.enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. It is valid when `Global.use_gpu` is `False`.
+ `Global.use_fp16`: Whether to enable FP16, default by `False`;
**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
If you want to evaluate the speed of the model, it is recommended to use [predict.py](../../../tools/infer/predict.py), and enable TensorRT to accelerate.
If you want to evaluate the speed of the model, it is recommended to enable TensorRT to accelerate for GPU, and MKL-DNN for CPU.

View File

@ -120,7 +120,7 @@ python3 tools/train.py \
`-c` is used to specify the path to the configuration file, and `-o` is used to specify the parameters that need to be modified or added, where `-o Arch.Backbone.pretrained=True` indicates that the Backbone part uses the pre-trained model, in addition, `Arch.Backbone.pretrained` can also specify backbone.`pretrained` can also specify the address of a specific model weight file, which needs to be replaced with the path to your own pre-trained model weight file when using it. `-o Global.device=gpu` indicates that the GPU is used for training. If you want to use a CPU for training, you need to set `Global.device` to `cpu`.
For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_en.md) for specific configuration parameters.
For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_description_en.md) for specific configuration parameters.
Run the above commands to check the output log, an example is as follows:

View File

@ -43,6 +43,11 @@ The detection model with the recognition inference model for the 4 directions (L
| Product Recignition Model | Product Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
| Vehicle ReID Model | Vehicle ReID Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | - | - |
| Models Introduction | Recommended Scenarios | inference Model | Predict Config File | Config File to Build Index Database |
| ------------ | ------------- | -------- | ------- | -------- |
| Lightweight generic mainbody detection model | General Scenarios |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | - | - |
| Lightweight generic recognition model | General Scenarios | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
Demo data in this tutorial can be downloaded here: [download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar).
@ -50,6 +55,7 @@ Demo data in this tutorial can be downloaded here: [download link](https://paddl
**Attention**
1. If you do not have wget installed on Windows, you can download the model by copying the link into your browser and unzipping it in the appropriate folder; for Linux or macOS users, you can right-click and copy the download link to download it via the `wget` command.
2. If you want to install `wget` on macOS, you can run the following command.
3. The predict config file of the lightweight generic recognition model and the config file to build index database are used for the config of product recognition model of server-side. You can modify the path of the model to complete the index building and prediction.
```shell
# install homebrew
@ -123,6 +129,13 @@ The `models` folder should have the following file structure.
│ └── inference.pdmodel
```
**Attention**
If you want to use the lightweight generic recognition model, you need to re-extract the features of the demo data and re-build the index. The way is as follows:
```shell
python3.7 python/build_gallery.py -c configs/build_product.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer
```
<a name="Product_recognition_and_retrival"></a>
### 2.2 Product Recognition and Retrieval

Binary file not shown.

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 60 KiB

View File

@ -31,9 +31,9 @@
| 模型 | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址 |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
| ResNet34_vd_ssld | 0.797 | 0.760 | 0.037 | 2.434 | 6.222 | 7.39 | 21.82 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams) |
| ResNet50_vd_<br>ssld | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
| ResNet101_vd_<br>ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams) |
| Res2Net50_vd_<br>26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
| ResNet50_vd_ssld | 0.830 | 0.792 | 0.039 | 3.531 | 8.090 | 8.67 | 25.58 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
| ResNet101_vd_ssld | 0.837 | 0.802 | 0.035 | 6.117 | 13.762 | 16.1 | 44.57 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams) |
| Res2Net50_vd_26w_4s_ssld | 0.831 | 0.798 | 0.033 | 4.527 | 9.657 | 8.37 | 25.06 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net101_vd_<br>26w_4s_ssld | 0.839 | 0.806 | 0.033 | 8.087 | 17.312 | 16.67 | 45.22 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
| Res2Net200_vd_<br>26w_4s_ssld | 0.851 | 0.812 | 0.049 | 14.678 | 32.350 | 31.49 | 76.21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
| HRNet_W18_C_ssld | 0.812 | 0.769 | 0.043 | 7.406 | 13.297 | 4.14 | 21.29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
@ -45,16 +45,44 @@
| 模型 | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址 |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
| MobileNetV1_<br>ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams) |
| MobileNetV2_<br>ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
| MobileNetV3_<br>small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
| MobileNetV3_<br>large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
| MobileNetV3_small_<br>x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
| GhostNet_<br>x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) |
| MobileNetV1_ssld | 0.779 | 0.710 | 0.069 | 32.523 | 1.11 | 4.19 | 16 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams) |
| MobileNetV2_ssld | 0.767 | 0.722 | 0.045 | 23.318 | 0.6 | 3.44 | 14 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) |
| MobileNetV3_small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.635 | 0.026 | 1.66 | 6.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) |
| MobileNetV3_large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 19.308 | 0.45 | 5.47 | 21 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) |
| MobileNetV3_small_x1_0_ssld | 0.713 | 0.682 | 0.031 | 6.546 | 0.123 | 2.94 | 12 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) |
| GhostNet_x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.983 | 0.44 | 7.3 | 29 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) |
* Intel CPU端知识蒸馏模型
| 模型 | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | Flops(M) | Params(M) | 下载地址 |
|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|
| PPLCNet_x0_5_ssld | 0.661 | 0.631 | 0.030 | 2.05 | 47 | 1.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams) |
| PPLCNet_x1_0_ssld | 0.744 | 0.713 | 0.033 | 2.46 | 161 | 3.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams) |
| PPLCNet_x2_5_ssld | 0.808 | 0.766 | 0.042 | 5.39 | 906 | 9.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams) |
* 注: `Reference Top-1 Acc`表示PaddleClas基于ImageNet1k数据集训练得到的预训练模型精度。
<a name="PPLCNet系列"></a>
### PPLCNet系列
PPLCNet系列模型的精度、速度指标如下表所示更多关于该系列的模型介绍可以参考[PPLCNet系列模型文档](./models/PPLCNet.md)。
| 模型 | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | 下载地址 |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| PPLCNet_x0_25 |0.5186 | 0.7565 | 1.74 | 18 | 1.5 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) |
| PPLCNet_x0_35 |0.5809 | 0.8083 | 1.92 | 29 | 1.6 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) |
| PPLCNet_x0_5 |0.6314 | 0.8466 | 2.05 | 47 | 1.9 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) |
| PPLCNet_x0_75 |0.6818 | 0.8830 | 2.29 | 99 | 2.4 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) |
| PPLCNet_x1_0 |0.7132 | 0.9003 | 2.46 | 161 | 3.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) |
| PPLCNet_x1_5 |0.7371 | 0.9153 | 3.19 | 342 | 4.5 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) |
| PPLCNet_x2_0 |0.7518 | 0.9227 | 4.27 | 590 | 6.5 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) |
| PPLCNet_x2_5 |0.7660 | 0.9300 | 5.39 | 906 | 9.0 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) |
<a name="ResNet及其Vd系列"></a>
### ResNet及其Vd系列
@ -429,7 +457,7 @@ ViTVision Transformer与DeiTData-efficient Image Transformers系列
| 模型 | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址 |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
| TNT_small | 0.8121 |0.9563 | | | 5.2 | 23.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) | |
| TNT_small | 0.8121 |0.9563 | | | 5.2 | 23.8 | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) | |
**注**TNT模型的数据预处理部分`NormalizeImage`中的`mean`与`std`均为0.5。

View File

@ -25,58 +25,66 @@ tar -xf NUS-SCENE-dataset.tar
cd ../../
```
## 二、环境准备
### 2.1 下载预训练模型
本例展示基于ResNet50_vd模型的多标签分类流程因此首先下载ResNet50_vd的预训练模型
```bash
mkdir pretrained
cd pretrained
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
cd ../
```
## 三、模型训练
## 二、模型训练
```shell
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch \
--gpus="0" \
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./configs/quick_start/ResNet50_vd_multilabel.yaml
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
```
训练10epoch之后验证集最好的正确率应该在0.72左右。
训练10epoch之后验证集最好的正确率应该在0.95左右。
## 、模型评估
## 三、模型评估
```bash
python tools/eval.py \
-c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
-o pretrained_model="./output/ResNet50_vd/best_model/ppcls" \
-o load_static_weights=False
python3 tools/eval.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
评估指标采用mAP验证集的mAP应该在0.57左右。
## 五、模型预测
## 四、模型预测
```bash
python tools/infer/infer.py \
-i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
--model ResNet50_vd \
--pretrained_model "./output/ResNet50_vd/best_model/ppcls" \
--use_gpu True \
--load_static_weights False \
--multilabel True \
--class_num 33
python3 tools/infer.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
得到类似下面的输出:
```
class id: 3, probability: 0.6025
class id: 23, probability: 0.5491
class id: 32, probability: 0.7006
```
```
[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]
```
## 五、基于预测引擎预测
### 5.1 导出inference model
```bash
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
inference model的路径默认在当前路径下`./inference`
### 5.2 基于预测引擎预测
首先进入deploy目录下
```bash
cd ./deploy
```
通过预测引擎推理预测:
```
python3 python/predict_cls.py \
-c configs/inference_multilabel_cls.yaml
```
得到类似下面的输出:
```
0517_2715693311.jpg: class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
```

View File

@ -22,8 +22,8 @@
以下为各应用在不同数据集下的预训练模型
- 车辆细分类:[CompCars](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.1_pretrained.pdparams)
- 车辆ReID[VERI-Wild](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.0_pretrained.pdparams)
- 车辆细分类:[CompCars](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.2_pretrained.pdparams)
- 车辆ReID[VERI-Wild](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.1_pretrained.pdparams)
- 动漫人物识别:[iCartoon](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/cartoon_rec_ResNet50_iCartoon_v1.0_pretrained.pdparams)
- Logo识别[Logo3K](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/logo_rec_ResNet50_Logo3K_v1.0_pretrained.pdparams)
- 商品识别: [Inshop](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Inshop_pretrained_v1.0.pdparams)、[Aliproduct](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams)
- Logo识别[Logo3K](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/logo_rec_ResNet50_Logo3K_v1.1_pretrained.pdparams)
- 商品识别: [Inshop](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Inshop_pretrained_v1.1.pdparams)、[Aliproduct](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams)

View File

@ -7,7 +7,7 @@
* 图像分类、识别、检索领域大佬众多,模型和论文更新速度也很快,本文档回答主要依赖有限的项目实践,难免挂一漏万,如有遗漏和不足,也希望有识之士帮忙补充和修正,万分感谢。
## 目录
* [近期更新](#近期更新)(2021.08.11)
* [近期更新](#近期更新)(2021.09.08)
* [精选](#精选)
* [1. 理论篇](#1.理论篇)
* [1.1 PaddleClas基础知识](#1.1PaddleClas基础知识)
@ -27,60 +27,69 @@
<a name="近期更新"></a>
## 近期更新
#### Q2.6.2: 导出inference模型进行预测部署准确率异常为什么呢
**A**: 该问题通常是由于在导出时未能正确加载模型参数导致的,首先检查模型导出时的日志,是否存在类似下述内容:
#### Q2.1.7: 在训练时,出现如下报错信息:`ERROR: Unexpected segmentation fault encountered in DataLoader workers.`,如何排查解决问题呢?
**A**:尝试将训练配置文件中的字段 `num_workers` 设置为 `0`;尝试将训练配置文件中的字段 `batch_size` 调小一些;检查数据集格式和配置文件中的数据集路径是否正确。
#### Q2.1.8: 如何在训练时使用 `Mixup``Cutmix`
**A**
* `Mixup` 的使用方法请参考 [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml#L63-L65)`Cuxmix` 请参考 [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65)。
* 在使用 `Mixup``Cutmix` 时,需要注意:
* 配置文件中的 `Loss.Tranin.CELoss` 需要修改为 `Loss.Tranin.MixCELoss`,可参考 [MixCELoss](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L23-L26)
* 使用 `Mixup``Cutmix` 做训练时无法计算训练的精度Acc指标因此需要在配置文件中取消 `Metric.Train.TopkAcc` 字段,可参考 [Metric.Train.TopkAcc](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128)。
#### Q2.1.9: 训练配置yaml文件中字段 `Global.pretrain_model``Global.checkpoints` 分别用于配置什么呢?
**A**
* 当需要 `fine-tune` 时,可以通过字段 `Global.pretrain_model` 配置预训练模型权重文件的路径,预训练模型权重文件后缀名通常为 `.pdparams`
* 在训练过程中训练程序会自动保存每个epoch结束时的断点信息包括优化器信息 `.pdopt` 和模型权重信息 `.pdparams`。在训练过程意外中断等情况下,需要恢复训练时,可以通过字段 `Global.checkpoints` 配置训练过程中保存的断点信息文件,例如通过配置 `checkpoints: ./output/ResNet18/epoch_18` 即可恢复18epoch训练结束时的断点信息PaddleClas将自动加载 `epoch_18.pdopt``epoch_18.pdparams`从19epoch继续训练。
#### Q2.6.3: 如何将模型转为 `ONNX` 格式?
**A**Paddle支持两种转ONNX格式模型的方式且依赖于 `paddle2onnx` 工具,首先需要安装 `paddle2onnx`
```shell
pip install paddle2onnx
```
UserWarning: Skip loading for ***. *** is not found in the provided dict.
```
如果存在,则说明模型权重未能加载成功,请进一步检查配置文件中的 `Global.pretrained_model` 字段,是否正确配置了模型权重文件的路径。模型权重文件后缀名通常为 `pdparams`,注意在配置该路径时无需填写文件后缀名。
#### Q2.1.4: 数据预处理中,不想对输入数据进行裁剪,该如何设置?或者如何设置剪裁的尺寸。
**A**: PaddleClas 支持的数据预处理算子可在这里查看:`ppcls/data/preprocess/__init__.py`,所有支持的算子均可在配置文件中进行配置,配置的算子名称需要和算子类名一致,参数与对应算子类的构造函数参数一致。如不需要对图像裁剪,则可去掉 `CropImage`、`RandCropImage`,使用 `ResizeImage` 替换即可可通过其参数设置不同的resize方式 使用 `size` 参数则直接将图像缩放至固定大小,使用`resize_short` 参数则会维持图像宽高比进行缩放。设置裁剪尺寸时,可通过 `CropImage` 算子的 `size` 参数,或 `RandCropImage` 算子的 `size` 参数。
* 从 inference model 转为 ONNX 格式模型:
#### Q1.1.3: Momentum 优化器中的 momentum 参数是什么意思呢?
**A**: Momentum 优化器是在 SGD 优化器的基础上引入了“动量”的概念。在 SGD 优化器中,在 `t+1` 时刻,参数 `w` 的更新可表示为:
```latex
w_t+1 = w_t - lr * grad
```
其中,`lr` 为学习率,`grad` 为此时参数 `w` 的梯度。在引入动量的概念后,参数 `w` 的更新可表示为:
```latex
v_t+1 = m * v_t + lr * grad
w_t+1 = w_t - v_t+1
```
其中,`m` 即为动量 `momentum`,表示累积动量的加权值,一般取 `0.9`,当取值小于 `1` 时,则越早期的梯度对当前的影响越小,例如,当动量参数 `m``0.9` 时,在 `t` 时刻,`t-5` 的梯度加权值为 `0.9 ^ 5 = 0.59049`,而 `t-2` 时刻的梯度加权值为 `0.9 ^ 2 = 0.81`。因此,太过“久远”的梯度信息对当前的参考意义很小,而“最近”的历史梯度信息对当前影响更大,这也是符合直觉的。
以动态图导出的 `combined` 格式 inference model包含 `.pdmodel``.pdiparams` 两个文件)为例,使用以下命令进行模型格式转换:
```shell
paddle2onnx --model_dir ${model_path} --model_filename ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
```
上述命令中:
* `model_dir`:该参数下需要包含 `.pdmodel``.pdiparams` 两个文件;
* `model_filename`:该参数用于指定参数 `model_dir` 下的 `.pdmodel` 文件路径;
* `params_filename`:该参数用于指定参数 `model_dir` 下的 `.pdiparams` 文件路径;
* `save_file`:该参数用于指定转换后的模型保存目录路径。
<div align="center">
<img src="../../images/faq/momentum.jpeg" width="400">
</div>
关于静态图导出的非 `combined` 格式的 inference model通常包含文件 `__model__` 和多个参数文件)转换模型格式,以及更多参数说明请参考 paddle2onnx 官方文档 [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#%E5%8F%82%E6%95%B0%E9%80%89%E9%A1%B9)。
*该图来自 `https://blog.csdn.net/tsyccnh/article/details/76270707`*
* 直接从模型组网代码导出ONNX格式模型
通过引入动量的概念,在参数更新时考虑了历史更新的影响,因此可以加快收敛速度,也改善了 `SGD` 优化器带来的损失cost、loss震荡问题。
以动态图模型组网代码为例,模型类为继承于 `paddle.nn.Layer` 的子类,代码如下所示:
#### Q1.1.4: PaddleClas 是否有 `Fixing the train-test resolution discrepancy` 这篇论文的实现呢?
**A**: 目前 PaddleClas 没有实现。如果需要可以尝试自己修改代码。简单来说该论文所提出的思想是使用较大分辨率作为输入对已经训练好的模型最后的FC层进行fine-tune。具体操作上首先在较低分辨率的数据集上对模型网络进行训练完成训练后对网络除最后的FC层外的其他层的权重设置参数 `stop_gradient=True`然后使用较大分辨率的输入对网络进行fine-tune训练。
```python
import paddle
from paddle.static import InputSpec
#### Q1.6.2: PaddleClas 图像识别用于 Eval 的配置文件中,`Query` 和 `Gallery` 配置具体是用于做什么呢?
**A**: `Query``Gallery` 均为数据集配置,其中 `Gallery` 用于配置底库数据,`Query` 用于配置验证集。在进行 Eval 时,首先使用模型对 `Gallery` 底库数据进行前向计算特征向量,特征向量用于构建底库,然后模型对 `Query` 验证集中的数据进行前向计算特征向量,再与底库计算召回率等指标。
class SimpleNet(paddle.nn.Layer):
def __init__(self):
pass
def forward(self, x):
pass
#### Q2.1.5: PaddlePaddle 安装后,使用报错,无法导入 paddle 下的任何模块import paddle.xxx是为什么呢
**A**: 首先可以使用以下代码测试 Paddle 是否安装正确:
```python
import paddle
paddle.utils.install_check.run_check(
```
正确安装时,通常会有如下提示:
```
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
```
如未能安装成功,则会有相应问题的提示。
另外在同时安装CPU版本和GPU版本Paddle后由于两个版本存在冲突需要将两个版本全部卸载然后重新安装所需要的版本。
net = SimpleNet()
x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
```
其中:
* `InputSpec()` 函数用于描述模型输入的签名信息,包括输入数据的 `shape`、`type` 和 `name`(可省略);
* `paddle.onnx.export()` 函数需要指定模型组网对象 `net`,导出模型的保存路径 `save_path`,模型的输入数据描述 `input_spec`
#### Q2.1.6: 使用PaddleClas训练时如何设置仅保存最优模型不想保存中间模型。
**A**: PaddleClas在训练过程中会保存/更新以下三类模型:
1. 最新的模型(`latest.pdopt` `latest.pdparams``latest.pdstates`),当训练意外中断时,可使用最新保存的模型恢复训练;
2. 最优的模型(`best_model.pdopt``best_model.pdparams``best_model.pdstates`
3. 训练过程中一个epoch结束时的断点`epoch_xxx.pdopt``epoch_xxx.pdparams``epoch_xxx.pdstates`)。训练配置文件中 `Global.save_interval` 字段表示该模型的保存间隔。将该字段设置大于总epochs数则不再保存中间断点模型。
需要注意,`paddlepaddle` 版本需大于 `2.0.0`。关于 `paddle.onnx.export()` 函数的更多参数说明请参考[paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/export_cn.html#export)。
#### Q2.5.4: 在 build 检索底库时,参数 `pq_size` 应该如何设置?
**A**`pq_size` 是PQ检索算法的参数。PQ检索算法可以简单理解为“分层”检索算法`pq_size` 是每层的“容量”因此该参数的设置会影响检索性能不过在底库总数据量不太大小于10000张的情况下这个参数对性能的影响很小因此对于大多数使用场景而言在构建底库时无需修改该参数。关于PQ检索算法的更多内容可以查看相关[论文](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf)。
<a name="精选"></a>
## 精选
@ -204,6 +213,22 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
2. 最优的模型(`best_model.pdopt``best_model.pdparams``best_model.pdstates`
3. 训练过程中一个epoch结束时的断点`epoch_xxx.pdopt``epoch_xxx.pdparams``epoch_xxx.pdstates`)。训练配置文件中 `Global.save_interval` 字段表示该模型的保存间隔。将该字段设置大于总epochs数则不再保存中间断点模型。
#### Q2.1.7: 在训练时,出现如下报错信息:`ERROR: Unexpected segmentation fault encountered in DataLoader workers.`,如何排查解决问题呢?
**A**:尝试将训练配置文件中的字段 `num_workers` 设置为 `0`;尝试将训练配置文件中的字段 `batch_size` 调小一些;检查数据集格式和配置文件中的数据集路径是否正确。
#### Q2.1.8: 如何在训练时使用 `Mixup``Cutmix`
**A**
* `Mixup` 的使用方法请参考 [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml#L63-L65)`Cuxmix` 请参考 [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65)。
* 在使用 `Mixup``Cutmix` 时,需要注意:
* 配置文件中的 `Loss.Tranin.CELoss` 需要修改为 `Loss.Tranin.MixCELoss`,可参考 [MixCELoss](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L23-L26)
* 使用 `Mixup``Cutmix` 做训练时无法计算训练的精度Acc指标因此需要在配置文件中取消 `Metric.Train.TopkAcc` 字段,可参考 [Metric.Train.TopkAcc](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128)。
#### Q2.1.9: 训练配置yaml文件中字段 `Global.pretrain_model``Global.checkpoints` 分别用于配置什么呢?
**A**
* 当需要 `fine-tune` 时,可以通过字段 `Global.pretrain_model` 配置预训练模型权重文件的路径,预训练模型权重文件后缀名通常为 `.pdparams`
* 在训练过程中训练程序会自动保存每个epoch结束时的断点信息包括优化器信息 `.pdopt` 和模型权重信息 `.pdparams`。在训练过程意外中断等情况下,需要恢复训练时,可以通过字段 `Global.checkpoints` 配置训练过程中保存的断点信息文件,例如通过配置 `checkpoints: ./output/ResNet18/epoch_18` 即可恢复18epoch训练结束时的断点信息PaddleClas将自动加载 `epoch_18.pdopt``epoch_18.pdparams`从19epoch继续训练。
<a name="2.2图像分类"></a>
### 2.2 图像分类
@ -255,6 +280,9 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
#### Q2.5.3: Mac重新编译index.so时报错如下clang: error: unsupported option '-fopenmp', 该如何处理?
**A**:该问题已经解决。可以参照[文档](../../../develop/deploy/vector_search/README.md)重新编译 index.so。
#### Q2.5.4: 在 build 检索底库时,参数 `pq_size` 应该如何设置?
**A**`pq_size` 是PQ检索算法的参数。PQ检索算法可以简单理解为“分层”检索算法`pq_size` 是每层的“容量”因此该参数的设置会影响检索性能不过在底库总数据量不太大小于10000张的情况下这个参数对性能的影响很小因此对于大多数使用场景而言在构建底库时无需修改该参数。关于PQ检索算法的更多内容可以查看相关[论文](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf)。
<a name="2.6模型预测部署"></a>
### 2.6 模型预测部署
@ -267,3 +295,48 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
UserWarning: Skip loading for ***. *** is not found in the provided dict.
```
如果存在,则说明模型权重未能加载成功,请进一步检查配置文件中的 `Global.pretrained_model` 字段,是否正确配置了模型权重文件的路径。模型权重文件后缀名通常为 `pdparams`,注意在配置该路径时无需填写文件后缀名。
#### Q2.6.3: 如何将模型转为 `ONNX` 格式?
**A**Paddle支持两种转ONNX格式模型的方式且依赖于 `paddle2onnx` 工具,首先需要安装 `paddle2onnx`
```shell
pip install paddle2onnx
```
* 从 inference model 转为 ONNX 格式模型:
以动态图导出的 `combined` 格式 inference model包含 `.pdmodel``.pdiparams` 两个文件)为例,使用以下命令进行模型格式转换:
```shell
paddle2onnx --model_dir ${model_path} --model_filename ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
```
上述命令中:
* `model_dir`:该参数下需要包含 `.pdmodel``.pdiparams` 两个文件;
* `model_filename`:该参数用于指定参数 `model_dir` 下的 `.pdmodel` 文件路径;
* `params_filename`:该参数用于指定参数 `model_dir` 下的 `.pdiparams` 文件路径;
* `save_file`:该参数用于指定转换后的模型保存目录路径。
关于静态图导出的非 `combined` 格式的 inference model通常包含文件 `__model__` 和多个参数文件)转换模型格式,以及更多参数说明请参考 paddle2onnx 官方文档 [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#%E5%8F%82%E6%95%B0%E9%80%89%E9%A1%B9)。
* 直接从模型组网代码导出ONNX格式模型
以动态图模型组网代码为例,模型类为继承于 `paddle.nn.Layer` 的子类,代码如下所示:
```python
import paddle
from paddle.static import InputSpec
class SimpleNet(paddle.nn.Layer):
def __init__(self):
pass
def forward(self, x):
pass
net = SimpleNet()
x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
```
其中:
* `InputSpec()` 函数用于描述模型输入的签名信息,包括输入数据的 `shape`、`type` 和 `name`(可省略);
* `paddle.onnx.export()` 函数需要指定模型组网对象 `net`,导出模型的保存路径 `save_path`,模型的输入数据描述 `input_spec`
需要注意,`paddlepaddle` 版本需大于 `2.0.0`。关于 `paddle.onnx.export()` 函数的更多参数说明请参考[paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/export_cn.html#export)。

View File

@ -0,0 +1,41 @@
# PPLCNet系列
## 概述
PPLCNet系列是百度PaddleCV团队提出的一种在Intel-CPU上表现优异的网络作者总结了一些在Intel-CPU上可以提升模型精度但几乎不增加推理耗时的方法将这些方法组合成了一个新的网络即PPLCNet。与其他轻量级网络相比PPLCNet可以在相同延时下取得更高的精度。PPLCNet已在图像分类、目标检测、语义分割上表现出了强大的竞争力。
## 精度、FLOPS和参数量
| Models | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|
| PPLCNet_x0_25 |0.5186 | 0.7565 | 18 | 1.5 |
| PPLCNet_x0_35 |0.5809 | 0.8083 | 29 | 1.6 |
| PPLCNet_x0_5 |0.6314 | 0.8466 | 47 | 1.9 |
| PPLCNet_x0_75 |0.6818 | 0.8830 | 99 | 2.4 |
| PPLCNet_x1_0 |0.7132 | 0.9003 | 161 | 3.0 |
| PPLCNet_x1_5 |0.7371 | 0.9153 | 342 | 4.5 |
| PPLCNet_x2_0 |0.7518 | 0.9227 | 590 | 6.5 |
| PPLCNet_x2_5 |0.7660 | 0.9300 | 906 | 9.0 |
| PPLCNet_x0_5_ssld |0.6610 | 0.8646 | 47 | 1.9 |
| PPLCNet_x1_0_ssld |0.7439 | 0.9209 | 161 | 3.0 |
| PPLCNet_x2_5_ssld |0.8082 | 0.9533 | 906 | 9.0 |
## 基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz的预测速度
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|------------------|-----------|-------------------|--------------------------|
| PPLCNet_x0_25 | 224 | 256 | 1.74 |
| PPLCNet_x0_35 | 224 | 256 | 1.92 |
| PPLCNet_x0_5 | 224 | 256 | 2.05 |
| PPLCNet_x0_75 | 224 | 256 | 2.29 |
| PPLCNet_x1_0 | 224 | 256 | 2.46 |
| PPLCNet_x1_5 | 224 | 256 | 3.19 |
| PPLCNet_x2_0 | 224 | 256 | 4.27 |
| PPLCNet_x2_5 | 224 | 256 | 5.39 |
| PPLCNet_x0_5_ssld | 224 | 256 | 2.05 |
| PPLCNet_x1_0_ssld | 224 | 256 | 2.46 |
| PPLCNet_x2_5_ssld | 224 | 256 | 5.39 |

View File

@ -0,0 +1,41 @@
# PPLCNet系列
## 概述
PPLCNet系列是百度PaddleCV团队提出的一种在Intel-CPU上表现优异的网络作者总结了一些在Intel-CPU上可以提升模型精度但几乎不增加推理耗时的方法将这些方法组合成了一个新的网络即PPLCNet。与其他轻量级网络相比PPLCNet可以在相同延时下取得更高的精度。PPLCNet已在图像分类、目标检测、语义分割上表现出了强大的竞争力。
## 精度、FLOPS和参数量
| Models | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|
| PPLCNet_x0_25 |0.5186 | 0.7565 | 18 | 1.5 |
| PPLCNet_x0_35 |0.5809 | 0.8083 | 29 | 1.6 |
| PPLCNet_x0_5 |0.6314 | 0.8466 | 47 | 1.9 |
| PPLCNet_x0_75 |0.6818 | 0.8830 | 99 | 2.4 |
| PPLCNet_x1_0 |0.7132 | 0.9003 | 161 | 3.0 |
| PPLCNet_x1_5 |0.7371 | 0.9153 | 342 | 4.5 |
| PPLCNet_x2_0 |0.7518 | 0.9227 | 590 | 6.5 |
| PPLCNet_x2_5 |0.7660 | 0.9300 | 906 | 9.0 |
| PPLCNet_x0_5_ssld |0.6610 | 0.8646 | 47 | 1.9 |
| PPLCNet_x1_0_ssld |0.7439 | 0.9209 | 161 | 3.0 |
| PPLCNet_x2_5_ssld |0.8082 | 0.9533 | 906 | 9.0 |
## 基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz的预测速度
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|------------------|-----------|-------------------|--------------------------|
| PPLCNet_x0_25 | 224 | 256 | 1.74 |
| PPLCNet_x0_35 | 224 | 256 | 1.92 |
| PPLCNet_x0_5 | 224 | 256 | 2.05 |
| PPLCNet_x0_75 | 224 | 256 | 2.29 |
| PPLCNet_x1_0 | 224 | 256 | 2.46 |
| PPLCNet_x1_5 | 224 | 256 | 3.19 |
| PPLCNet_x2_0 | 224 | 256 | 4.27 |
| PPLCNet_x2_5 | 224 | 256 | 5.39 |
| PPLCNet_x0_5_ssld | 224 | 256 | 2.05 |
| PPLCNet_x1_0_ssld | 224 | 256 | 2.46 |
| PPLCNet_x2_5_ssld | 224 | 256 | 5.39 |

View File

@ -117,7 +117,7 @@ python3 tools/train.py \
其中,`-c`用于指定配置文件的路径,`-o`用于指定需要修改或者添加的参数,其中`-o Arch.Backbone.pretrained=True`表示Backbone部分使用预训练模型此外`Arch.Backbone.pretrained`也可以指定具体的模型权重文件的地址,使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练则需要将`Global.device`设置为`cpu`。
更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)。
更详细的训练配置,也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config_description.md)。
运行上述命令,可以看到输出日志,示例如下:
@ -245,4 +245,4 @@ python3 tools/export_model.py \
- 平均检索精度(mAP)
- AP: AP指的是不同召回率上的正确率的平均值
- mAP: 测试集中所有图片对应的AP的的平均值
- mAP: 测试集中所有图片对应的AP的的平均值

View File

@ -34,6 +34,8 @@
检测模型与4个方向(Logo、动漫人物、车辆、商品)的识别inference模型、测试数据下载地址以及对应的配置文件地址如下。
服务器端通用主体检测模型与各方向识别模型:
| 模型简介 | 推荐场景 | inference模型 | 预测配置文件 | 构建索引库的配置文件 |
| ------------ | ------------- | -------- | ------- | -------- |
| 通用主体检测模型 | 通用场景 |[模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | - |
@ -43,6 +45,12 @@
| 商品识别模型 | 商品场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
| 车辆ReID模型 | 车辆ReID场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | - | - |
轻量级通用主体检测模型与轻量级通用识别模型:
| 模型简介 | 推荐场景 | inference模型 | 预测配置文件 | 构建索引库的配置文件 |
| ------------ | ------------- | -------- | ------- | -------- |
| 轻量级通用主体检测模型 | 通用场景 |[模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | - | - |
| 轻量级通用识别模型 | 通用场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
本章节demo数据下载地址如下: [数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar)。
@ -50,6 +58,7 @@
1. windows 环境下如果没有安装wget,可以按照下面的步骤安装wget与tar命令也可以在下载模型时将链接复制到浏览器中下载并解压放置在相应目录下linux或者macOS用户可以右键点击然后复制下载链接即可通过`wget`命令下载。
2. 如果macOS环境下没有安装`wget`命令,可以运行下面的命令进行安装。
3. 轻量级通用识别模型的预测配置文件和构建索引的配置文件目前使用的是服务器端商品识别模型的配置,您可以自行修改模型的路径完成相应的索引构建和识别预测。
```shell
# 安装 homebrew
@ -124,6 +133,13 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognit
│ └── inference.pdmodel
```
**注意**
如果使用轻量级通用识别模型Demo数据需要重新提取特征、够建索引方式如下
```shell
python3.7 python/build_gallery.py -c configs/build_product.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer
```
<a name="商品识别与检索"></a>
### 2.2 商品识别与检索

View File

@ -0,0 +1,218 @@
# 主体检测
主体检测技术是目前应用非常广泛的一种检测技术,它指的是检测出图片中一个或者多个主体的坐标位置,然后将图像中的对应区域裁剪下来,进行识别,从而完成整个识别过程。主体检测是识别任务的前序步骤,可以有效提升识别精度。
本部分主要从数据集、模型选择和模型训练 3 个方面对该部分内容进行介绍。
## 1. 数据集
在 PaddleClas 的识别任务中,训练主体检测模型时主要用到了以下几个数据集。
| 数据集 | 数据量 | 主体检测任务中使用的数据量 | 场景 | 数据集地址 |
| :------------: | :-------------: | :-------: | :-------: | :--------: |
| Objects365 | 170W | 6k | 通用场景 | [地址](https://www.objects365.org/overview.html) |
| COCO2017 | 12W | 5k | 通用场景 | [地址](https://cocodataset.org/) |
| iCartoonFace | 2k | 2k | 动漫人脸检测 | [地址](https://github.com/luxiangju-PersonAI/iCartoonFace) |
| LogoDet-3k | 3k | 2k | Logo检测 | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| RPC | 3k | 3k | 商品检测 | [地址](https://rpc-dataset.github.io/) |
在实际训练的过程中,将所有数据集混合在一起。由于是主体检测,这里将所有标注出的检测框对应的类别都修改为 `前景` 的类别,最终融合的数据集中只包含 1 个类别,即前景。
## 2. 模型选择
目标检测方法种类繁多比较常用的有两阶段检测器如FasterRCNN系列等单阶段检测器如YOLO、SSD等anchor-free检测器如PicoDet、FCOS等。PaddleDetection中针对服务端使用场景自研了 PP-YOLO 系列模型针对端侧CPU和移动端等使用场景自研了 PicoDet 系列模型,在服务端和端侧均处于业界较为领先的水平。
基于上述研究PaddleClas 中提供了 2 个通用主体检测模型,为轻量级与服务端主体检测模型,分别适用于端侧场景以及服务端场景。下面的表格中给出了在上述 5 个数据集上的平均 mAP 以及它们的模型大小、预测速度对比信息。
| 模型 | 模型结构 | 预训练模型下载地址 | inference模型下载地址 | mAP | inference模型大小(MB) | 单张图片预测耗时(不包含预处理)(ms) |
| :------------: | :-------------: | :------: | :-------: | :--------: | :-------: | :--------: |
| 轻量级主体检测模型 | PicoDet | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_pretrained.pdparams) | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | 40.1% | 30.1 | 29.8 |
| 服务端主体检测模型 | PP-YOLOv2 | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams) | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | 42.5% | 210.5 | 466.6 |
* 注意
* 速度评测机器的CPU具体信息为`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`,速度指标为开启 mkldnn ,线程数设置为 10 测试得到。
* 主体检测的预处理过程较为耗时,平均每张图在上述机器上的时间在 40~55 ms 左右,没有包含在上述的预测耗时统计中。
### 2.1 轻量级主体检测模型
PicoDet 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出是一个适用于CPU或者移动端场景的目标检测算法。具体地它融合了下面一系列优化算法。
- [ATSS](https://arxiv.org/abs/1912.02424)
- [Generalized Focal Loss](https://arxiv.org/abs/2006.04388)
- 余弦学习率策略
- Cycle-EMA
- 轻量级检测 head
更多关于 PicoDet 的优化细节与 benchmark 可以参考 [PicoDet 系列模型介绍](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/README.md)。
在轻量级主体检测任务中,为了更好地兼顾检测速度与效果,我们使用 PPLCNet_x2_5 作为主体检测模型的骨干网络,同时将训练与预测的图像尺度修改为了 640x640其余配置与 [picodet_m_shufflenetv2_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/picodet_m_shufflenetv2_416_coco.yml)完全一致。将数据集更换为自定义的主体检测数据集,进行训练,最终得到检测模型。
### 2.2 服务端主体检测模型
PP-YOLO 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出,从骨干网络、数据增广、正则化策略、损失函数、后处理等多个角度对 yolov3 模型进行深度优化,最终在"速度-精度"方面达到了业界领先的水平。具体地,优化的策略如下。
- 更优的骨干网络: ResNet50vd-DCN
- 更大的训练batch size: 8 GPUs每GPU batch_size=24对应调整学习率和迭代轮数
- [Drop Block](https://arxiv.org/abs/1810.12890)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
- [CoordConv](https://arxiv.org/abs/1807.03247)
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
- 更优的预训练模型
更多关于 PP-YOLO 的详细介绍可以参考:[PP-YOLO 模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README_cn.md)。
在服务端主体检测任务中,为了保证检测效果,我们使用 ResNet50vd-DCN 作为检测模型的骨干网络,使用配置文件 [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) ,更换为自定义的主体检测数据集,进行训练,最终得到检测模型。
## 3. 模型训练
本节主要介绍怎样基于PaddleDetection基于自己的数据集训练主体检测模型。
### 3.1 环境准备
下载PaddleDetection代码安装requirements。
```shell
cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git
cd PaddleDetection
# 安装其他依赖
pip install -r requirements.txt
```
更多安装教程,请参考: [安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md)
### 3.2 数据准备
对于自定义数据集首先需要将自己的数据集修改为COCO格式可以参考[自定义检测数据集教程](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/static/docs/tutorials/Custom_DataSet.md)制作COCO格式的数据集。
主体检测任务中,所有的检测框均属于前景,在这里需要将标注文件中,检测框的`category_id`修改为1同时将整个标注文件中的`categories`映射表修改为下面的格式,即整个类别映射表中只包含`前景`类别。
```json
[{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}]
```
### 3.3 配置文件改动和说明
我们使用 `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` 配置进行训练,配置文件摘要如下:
<div align='center'>
<img src='../../images/det/PaddleDetection_config.png' width='400'/>
</div>
从上图看到 `ppyolov2_r50vd_dcn_365e_coco.yml` 配置需要依赖其他的配置文件,这些配置文件的含义如下:
```
coco_detection.yml主要说明了训练数据和验证数据的路径
runtime.yml主要说明了公共的运行参数比如是否使用GPU、每多少个epoch存储checkpoint等
optimizer_365e.yml主要说明了学习率和优化器的配置
ppyolov2_r50vd_dcn.yml主要说明模型和主干网络的情况
ppyolov2_reader.yml主要说明数据读取器配置如 batch size并发加载子进程数等同时包含读取后预处理操作如resize、数据增强等等
```
在主体检测任务中,需要将 `datasets/coco_detection.yml` 中的 `num_classes` 参数修改为 1 (只有 1 个前景类别),同时将训练集和测试集的路径修改为自定义数据集的路径。
此外,也可以根据实际情况,修改上述文件,比如,如果显存溢出,可以将 batch size 和学习率等比缩小等。
### 3.4 启动训练
PaddleDetection 提供了单卡/多卡训练模式,满足用户多种训练需求。
* GPU 单卡训练
```bash
# windows和Mac下不需要执行该命令
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml
```
* GPU多卡训练
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval
```
--eval表示边训练边验证。
* (**推荐**)模型微调
如果希望加载 PaddleClas 中已经训练好的主体检测模型,在自己的数据集上进行模型微调,可以使用下面的命令进行训练。
```bash
export CUDA_VISIBLE_DEVICES=0
# 指定pretrain_weights参数加载通用的主体检测预训练模型
python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pretrain_weights=https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams
```
* 模型恢复训练
在日常训练过程中,有的用户由于一些原因导致训练中断,可以使用 `-r` 的命令恢复训练:
```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval -r output/ppyolov2_r50vd_dcn_365e_coco/10000
```
注意:如果遇到 "`Out of memory error`" 问题, 尝试在 `ppyolov2_reader.yml` 文件中调小`batch_size`,同时等比例调小学习率。
### 3.5 模型预测与调试
使用下面的命令完成 PaddleDetection 的预测过程。
```bash
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer_img=your_image_path.jpg --output_dir=infer_output/ --draw_threshold=0.5 -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final
```
`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算,不同阈值会产生不同的结果 `keep_top_k` 表示设置输出目标的最大数量,默认值为 100 ,用户可以根据自己的实际情况进行设定。
### 3.6 模型导出与预测部署。
执行导出模型脚本:
```bash
python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --output_dir=./inference -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final.pdparams
```
预测模型会导出到 `inference/ppyolov2_r50vd_dcn_365e_coco` 目录下,分别为 `infer_cfg.yml` (预测不需要), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`
注意: `PaddleDetection` 导出的inference模型的文件格式为 `model.xxx`这里如果希望与PaddleClas的inference模型文件格式保持一致需要将其 `model.xxx` 文件修改为 `inference.xxx` 文件,用于后续主体检测的预测部署。
更多模型导出教程,请参考: [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md)
最终,目录 `inference/ppyolov2_r50vd_dcn_365e_coco` 中包含 `inference.pdiparams`, `inference.pdiparams.info` 以及 `inference.pdmodel` 文件,其中 `inference.pdiparams` 为保存的 inference 模型权重文件, `inference.pdmodel` 为保存的 inference 模型结构文件。
导出模型之后,在主体检测与识别任务中,就可以将检测模型的路径更改为该 inference 模型路径,完成预测。
以商品识别为例,其配置文件为 [inference_product.yaml](../../../deploy/configs/inference_product.yaml) ,修改其中的 `Global.det_inference_model_dir` 字段为导出的主体检测 inference 模型目录,参考[图像识别快速开始教程](../tutorials/quick_start_recognition.md) ,即可完成商品检测与识别过程。
### FAQ
#### Q可以使用其他的主体检测模型结构吗
* A可以的但是目前的检测预处理过程仅适配了 PicoDet 以及 YOLO 系列的预处理,因此在使用的时候,建议优先使用这两个系列的模型进行训练,如果希望使用 Faster RCNN 等其他系列的模型,需要按照 PaddleDetection 的数据预处理,修改下预处理逻辑,这块如果您有需求或者有问题的话,欢迎提 issue 或者在微信群里反馈。
#### Q可以修改主体检测的预测尺度吗
* A可以的但是需要注意 2 个地方
* PaddleClas 中提供的主体检测模型是基于 `640x640` 的分辨率去训练的,因此预测的时候也是默认使用 `640x640` 的分辨率进行预测,使用其他分辨率预测的话,精度会有所降低。
* 在模型导出的时候,建议也修改下模型导出的分辨率,保持模型导出、模型预测的分辨率一致。

View File

@ -21,6 +21,7 @@ from ppcls.arch.backbone.legendary_models.resnet import ResNet18, ResNet18_vd, R
from ppcls.arch.backbone.legendary_models.vgg import VGG11, VGG13, VGG16, VGG19
from ppcls.arch.backbone.legendary_models.inception_v3 import InceptionV3
from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W64_C
from ppcls.arch.backbone.legendary_models.pp_lcnet import PPLCNet_x0_25, PPLCNet_x0_35, PPLCNet_x0_5, PPLCNet_x0_75, PPLCNet_x1_0, PPLCNet_x1_5, PPLCNet_x2_0, PPLCNet_x2_5
from ppcls.arch.backbone.model_zoo.resnet_vc import ResNet50_vc
from ppcls.arch.backbone.model_zoo.resnext import ResNeXt50_32x4d, ResNeXt50_64x4d, ResNeXt101_32x4d, ResNeXt101_64x4d, ResNeXt152_32x4d, ResNeXt152_64x4d
@ -57,6 +58,7 @@ from ppcls.arch.backbone.model_zoo.dla import DLA34, DLA46_c, DLA46x_c, DLA60, D
from ppcls.arch.backbone.model_zoo.rednet import RedNet26, RedNet38, RedNet50, RedNet101, RedNet152
from ppcls.arch.backbone.model_zoo.tnt import TNT_small
from ppcls.arch.backbone.model_zoo.hardnet import HarDNet68, HarDNet85, HarDNet39_ds, HarDNet68_ds
from ppcls.arch.backbone.model_zoo.cspnet import CSPDarkNet53
from ppcls.arch.backbone.variant_models.resnet_variant import ResNet50_last_stage_stride1
from ppcls.arch.backbone.variant_models.vgg_variant import VGG19Sigmoid

View File

@ -0,0 +1,399 @@
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import, division, print_function
import paddle
import paddle.nn as nn
from paddle import ParamAttr
from paddle.nn import AdaptiveAvgPool2D, BatchNorm, Conv2D, Dropout, Linear
from paddle.regularizer import L2Decay
from paddle.nn.initializer import KaimingNormal
from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
MODEL_URLS = {
"PPLCNet_x0_25":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams",
"PPLCNet_x0_35":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams",
"PPLCNet_x0_5":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams",
"PPLCNet_x0_75":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams",
"PPLCNet_x1_0":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams",
"PPLCNet_x1_5":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams",
"PPLCNet_x2_0":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams",
"PPLCNet_x2_5":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams"
}
__all__ = list(MODEL_URLS.keys())
# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
# k: kernel_size
# in_c: input channel number in depthwise block
# out_c: output channel number in depthwise block
# s: stride in depthwise block
# use_se: whether to use SE block
NET_CONFIG = {
"blocks2":
#k, in_c, out_c, s, use_se
[[3, 16, 32, 1, False]],
"blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
"blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
"blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False],
[5, 256, 256, 1, False], [5, 256, 256, 1, False],
[5, 256, 256, 1, False], [5, 256, 256, 1, False]],
"blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
}
def make_divisible(v, divisor=8, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class ConvBNLayer(TheseusLayer):
def __init__(self,
num_channels,
filter_size,
num_filters,
stride,
num_groups=1):
super().__init__()
self.conv = Conv2D(
in_channels=num_channels,
out_channels=num_filters,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=num_groups,
weight_attr=ParamAttr(initializer=KaimingNormal()),
bias_attr=False)
self.bn = BatchNorm(
num_filters,
param_attr=ParamAttr(regularizer=L2Decay(0.0)),
bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
self.hardswish = nn.Hardswish()
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
x = self.hardswish(x)
return x
class DepthwiseSeparable(TheseusLayer):
def __init__(self,
num_channels,
num_filters,
stride,
dw_size=3,
use_se=False):
super().__init__()
self.use_se = use_se
self.dw_conv = ConvBNLayer(
num_channels=num_channels,
num_filters=num_channels,
filter_size=dw_size,
stride=stride,
num_groups=num_channels)
if use_se:
self.se = SEModule(num_channels)
self.pw_conv = ConvBNLayer(
num_channels=num_channels,
filter_size=1,
num_filters=num_filters,
stride=1)
def forward(self, x):
x = self.dw_conv(x)
if self.use_se:
x = self.se(x)
x = self.pw_conv(x)
return x
class SEModule(TheseusLayer):
def __init__(self, channel, reduction=4):
super().__init__()
self.avg_pool = AdaptiveAvgPool2D(1)
self.conv1 = Conv2D(
in_channels=channel,
out_channels=channel // reduction,
kernel_size=1,
stride=1,
padding=0)
self.relu = nn.ReLU()
self.conv2 = Conv2D(
in_channels=channel // reduction,
out_channels=channel,
kernel_size=1,
stride=1,
padding=0)
self.hardsigmoid = nn.Hardsigmoid()
def forward(self, x):
identity = x
x = self.avg_pool(x)
x = self.conv1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.hardsigmoid(x)
x = paddle.multiply(x=identity, y=x)
return x
class PPLCNet(TheseusLayer):
def __init__(self,
scale=1.0,
class_num=1000,
dropout_prob=0.2,
class_expand=1280):
super().__init__()
self.scale = scale
self.class_expand = class_expand
self.conv1 = ConvBNLayer(
num_channels=3,
filter_size=3,
num_filters=make_divisible(16 * scale),
stride=2)
self.blocks2 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
])
self.blocks3 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
])
self.blocks4 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
])
self.blocks5 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
])
self.blocks6 = nn.Sequential(*[
DepthwiseSeparable(
num_channels=make_divisible(in_c * scale),
num_filters=make_divisible(out_c * scale),
dw_size=k,
stride=s,
use_se=se)
for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
])
self.avg_pool = AdaptiveAvgPool2D(1)
self.last_conv = Conv2D(
in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
out_channels=self.class_expand,
kernel_size=1,
stride=1,
padding=0,
bias_attr=False)
self.hardswish = nn.Hardswish()
self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
self.fc = Linear(self.class_expand, class_num)
def forward(self, x):
x = self.conv1(x)
x = self.blocks2(x)
x = self.blocks3(x)
x = self.blocks4(x)
x = self.blocks5(x)
x = self.blocks6(x)
x = self.avg_pool(x)
x = self.last_conv(x)
x = self.hardswish(x)
x = self.dropout(x)
x = self.flatten(x)
x = self.fc(x)
return x
def _load_pretrained(pretrained, model, model_url, use_ssld):
if pretrained is False:
pass
elif pretrained is True:
load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
elif isinstance(pretrained, str):
load_dygraph_pretrain(model, pretrained)
else:
raise RuntimeError(
"pretrained type is not available. Please use `string` or `boolean` type."
)
def PPLCNet_x0_25(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNet_x0_25
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNet_x0_25` model depends on args.
"""
model = PPLCNet(scale=0.25, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_25"], use_ssld)
return model
def PPLCNet_x0_35(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNet_x0_35
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNet_x0_35` model depends on args.
"""
model = PPLCNet(scale=0.35, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_35"], use_ssld)
return model
def PPLCNet_x0_5(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNet_x0_5
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNet_x0_5` model depends on args.
"""
model = PPLCNet(scale=0.5, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_5"], use_ssld)
return model
def PPLCNet_x0_75(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNet_x0_75
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNet_x0_75` model depends on args.
"""
model = PPLCNet(scale=0.75, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_75"], use_ssld)
return model
def PPLCNet_x1_0(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNet_x1_0
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNet_x1_0` model depends on args.
"""
model = PPLCNet(scale=1.0, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x1_0"], use_ssld)
return model
def PPLCNet_x1_5(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNet_x1_5
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNet_x1_5` model depends on args.
"""
model = PPLCNet(scale=1.5, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x1_5"], use_ssld)
return model
def PPLCNet_x2_0(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNet_x2_0
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNet_x2_0` model depends on args.
"""
model = PPLCNet(scale=2.0, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x2_0"], use_ssld)
return model
def PPLCNet_x2_5(pretrained=False, use_ssld=False, **kwargs):
"""
PPLCNet_x2_5
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `PPLCNet_x2_5` model depends on args.
"""
model = PPLCNet(scale=2.5, **kwargs)
_load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x2_5"], use_ssld)
return model

View File

@ -0,0 +1,374 @@
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle import ParamAttr
from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
MODEL_URLS = {
"CSPDarkNet53":
"https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/CSPDarkNet53_pretrained.pdparams"
}
MODEL_CFGS = {
"CSPDarkNet53": dict(
stem=dict(
out_chs=32, kernel_size=3, stride=1, pool=''),
stage=dict(
out_chs=(64, 128, 256, 512, 1024),
depth=(1, 2, 8, 8, 4),
stride=(2, ) * 5,
exp_ratio=(2., ) + (1., ) * 4,
bottle_ratio=(0.5, ) + (1.0, ) * 4,
block_ratio=(1., ) + (0.5, ) * 4,
down_growth=True, ))
}
__all__ = ['CSPDarkNet53'
] # model_registry will add each entrypoint fn to this
class ConvBnAct(nn.Layer):
def __init__(self,
input_channels,
output_channels,
kernel_size=1,
stride=1,
padding=None,
dilation=1,
groups=1,
act_layer=nn.LeakyReLU,
norm_layer=nn.BatchNorm2D):
super().__init__()
if padding is None:
padding = (kernel_size - 1) // 2
self.conv = nn.Conv2D(
in_channels=input_channels,
out_channels=output_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
weight_attr=ParamAttr(),
bias_attr=False)
self.bn = norm_layer(num_features=output_channels)
self.act = act_layer()
def forward(self, inputs):
x = self.conv(inputs)
x = self.bn(x)
if self.act is not None:
x = self.act(x)
return x
def create_stem(in_chans=3,
out_chs=32,
kernel_size=3,
stride=2,
pool='',
act_layer=None,
norm_layer=None):
stem = nn.Sequential()
if not isinstance(out_chs, (tuple, list)):
out_chs = [out_chs]
assert len(out_chs)
in_c = in_chans
for i, out_c in enumerate(out_chs):
conv_name = f'conv{i + 1}'
stem.add_sublayer(
conv_name,
ConvBnAct(
in_c,
out_c,
kernel_size,
stride=stride if i == 0 else 1,
act_layer=act_layer,
norm_layer=norm_layer))
in_c = out_c
last_conv = conv_name
if pool:
stem.add_sublayer(
'pool', nn.MaxPool2D(
kernel_size=3, stride=2, padding=1))
return stem, dict(
num_chs=in_c, reduction=stride, module='.'.join(['stem', last_conv]))
class DarkBlock(nn.Layer):
def __init__(self,
in_chs,
out_chs,
dilation=1,
bottle_ratio=0.5,
groups=1,
act_layer=nn.ReLU,
norm_layer=nn.BatchNorm2D,
attn_layer=None,
drop_block=None):
super(DarkBlock, self).__init__()
mid_chs = int(round(out_chs * bottle_ratio))
ckwargs = dict(act_layer=act_layer, norm_layer=norm_layer)
self.conv1 = ConvBnAct(in_chs, mid_chs, kernel_size=1, **ckwargs)
self.conv2 = ConvBnAct(
mid_chs,
out_chs,
kernel_size=3,
dilation=dilation,
groups=groups,
**ckwargs)
def forward(self, x):
shortcut = x
x = self.conv1(x)
x = self.conv2(x)
x = x + shortcut
return x
class CrossStage(nn.Layer):
def __init__(self,
in_chs,
out_chs,
stride,
dilation,
depth,
block_ratio=1.,
bottle_ratio=1.,
exp_ratio=1.,
groups=1,
first_dilation=None,
down_growth=False,
cross_linear=False,
block_dpr=None,
block_fn=DarkBlock,
**block_kwargs):
super(CrossStage, self).__init__()
first_dilation = first_dilation or dilation
down_chs = out_chs if down_growth else in_chs
exp_chs = int(round(out_chs * exp_ratio))
block_out_chs = int(round(out_chs * block_ratio))
conv_kwargs = dict(
act_layer=block_kwargs.get('act_layer'),
norm_layer=block_kwargs.get('norm_layer'))
if stride != 1 or first_dilation != dilation:
self.conv_down = ConvBnAct(
in_chs,
down_chs,
kernel_size=3,
stride=stride,
dilation=first_dilation,
groups=groups,
**conv_kwargs)
prev_chs = down_chs
else:
self.conv_down = None
prev_chs = in_chs
self.conv_exp = ConvBnAct(
prev_chs, exp_chs, kernel_size=1, **conv_kwargs)
prev_chs = exp_chs // 2 # output of conv_exp is always split in two
self.blocks = nn.Sequential()
for i in range(depth):
self.blocks.add_sublayer(
str(i),
block_fn(prev_chs, block_out_chs, dilation, bottle_ratio,
groups, **block_kwargs))
prev_chs = block_out_chs
# transition convs
self.conv_transition_b = ConvBnAct(
prev_chs, exp_chs // 2, kernel_size=1, **conv_kwargs)
self.conv_transition = ConvBnAct(
exp_chs, out_chs, kernel_size=1, **conv_kwargs)
def forward(self, x):
if self.conv_down is not None:
x = self.conv_down(x)
x = self.conv_exp(x)
split = x.shape[1] // 2
xs, xb = x[:, :split], x[:, split:]
xb = self.blocks(xb)
xb = self.conv_transition_b(xb)
out = self.conv_transition(paddle.concat([xs, xb], axis=1))
return out
class DarkStage(nn.Layer):
def __init__(self,
in_chs,
out_chs,
stride,
dilation,
depth,
block_ratio=1.,
bottle_ratio=1.,
groups=1,
first_dilation=None,
block_fn=DarkBlock,
block_dpr=None,
**block_kwargs):
super().__init__()
first_dilation = first_dilation or dilation
self.conv_down = ConvBnAct(
in_chs,
out_chs,
kernel_size=3,
stride=stride,
dilation=first_dilation,
groups=groups,
act_layer=block_kwargs.get('act_layer'),
norm_layer=block_kwargs.get('norm_layer'))
prev_chs = out_chs
block_out_chs = int(round(out_chs * block_ratio))
self.blocks = nn.Sequential()
for i in range(depth):
self.blocks.add_sublayer(
str(i),
block_fn(prev_chs, block_out_chs, dilation, bottle_ratio,
groups, **block_kwargs))
prev_chs = block_out_chs
def forward(self, x):
x = self.conv_down(x)
x = self.blocks(x)
return x
def _cfg_to_stage_args(cfg, curr_stride=2, output_stride=32):
# get per stage args for stage and containing blocks, calculate strides to meet target output_stride
num_stages = len(cfg['depth'])
if 'groups' not in cfg:
cfg['groups'] = (1, ) * num_stages
if 'down_growth' in cfg and not isinstance(cfg['down_growth'],
(list, tuple)):
cfg['down_growth'] = (cfg['down_growth'], ) * num_stages
stage_strides = []
stage_dilations = []
stage_first_dilations = []
dilation = 1
for cfg_stride in cfg['stride']:
stage_first_dilations.append(dilation)
if curr_stride >= output_stride:
dilation *= cfg_stride
stride = 1
else:
stride = cfg_stride
curr_stride *= stride
stage_strides.append(stride)
stage_dilations.append(dilation)
cfg['stride'] = stage_strides
cfg['dilation'] = stage_dilations
cfg['first_dilation'] = stage_first_dilations
stage_args = [
dict(zip(cfg.keys(), values)) for values in zip(*cfg.values())
]
return stage_args
class CSPNet(nn.Layer):
def __init__(self,
cfg,
in_chans=3,
class_num=1000,
output_stride=32,
global_pool='avg',
drop_rate=0.,
act_layer=nn.LeakyReLU,
norm_layer=nn.BatchNorm2D,
zero_init_last_bn=True,
stage_fn=CrossStage,
block_fn=DarkBlock):
super().__init__()
self.class_num = class_num
self.drop_rate = drop_rate
assert output_stride in (8, 16, 32)
layer_args = dict(act_layer=act_layer, norm_layer=norm_layer)
# Construct the stem
self.stem, stem_feat_info = create_stem(in_chans, **cfg['stem'],
**layer_args)
self.feature_info = [stem_feat_info]
prev_chs = stem_feat_info['num_chs']
curr_stride = stem_feat_info[
'reduction'] # reduction does not include pool
if cfg['stem']['pool']:
curr_stride *= 2
# Construct the stages
per_stage_args = _cfg_to_stage_args(
cfg['stage'], curr_stride=curr_stride, output_stride=output_stride)
self.stages = nn.LayerList()
for i, sa in enumerate(per_stage_args):
self.stages.add_sublayer(
str(i),
stage_fn(
prev_chs, **sa, **layer_args, block_fn=block_fn))
prev_chs = sa['out_chs']
curr_stride *= sa['stride']
self.feature_info += [
dict(
num_chs=prev_chs,
reduction=curr_stride,
module=f'stages.{i}')
]
# Construct the head
self.num_features = prev_chs
self.pool = nn.AdaptiveAvgPool2D(1)
self.flatten = nn.Flatten(1)
self.fc = nn.Linear(
prev_chs,
class_num,
weight_attr=ParamAttr(),
bias_attr=ParamAttr())
def forward(self, x):
x = self.stem(x)
for stage in self.stages:
x = stage(x)
x = self.pool(x)
x = self.flatten(x)
x = self.fc(x)
return x
def _load_pretrained(pretrained, model, model_url, use_ssld=False):
if pretrained is False:
pass
elif pretrained is True:
load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
elif isinstance(pretrained, str):
load_dygraph_pretrain(model, pretrained)
else:
raise RuntimeError(
"pretrained type is not available. Please use `string` or `boolean` type."
)
def CSPDarkNet53(pretrained=False, use_ssld=False, **kwargs):
model = CSPNet(MODEL_CFGS["CSPDarkNet53"], block_fn=DarkBlock, **kwargs)
_load_pretrained(
pretrained, model, MODEL_URLS["CSPDarkNet53"], use_ssld=use_ssld)
return model

View File

@ -131,7 +131,7 @@ class GoogLeNetDY(nn.Layer):
self._ince5b = Inception(
832, 832, 384, 192, 384, 48, 128, 128, name="ince5b")
self._pool_5 = AvgPool2D(kernel_size=7, stride=7)
self._pool_5 = AdaptiveAvgPool2D(1)
self._drop = Dropout(p=0.4, mode="downscale_in_infer")
self._fc_out = Linear(

View File

@ -24,30 +24,25 @@ class ArcMargin(nn.Layer):
margin=0.5,
scale=80.0,
easy_margin=False):
super(ArcMargin, self).__init__()
super().__init__()
self.embedding_size = embedding_size
self.class_num = class_num
self.margin = margin
self.scale = scale
self.easy_margin = easy_margin
weight_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.XavierNormal())
self.fc = nn.Linear(
self.embedding_size,
self.class_num,
weight_attr=weight_attr,
bias_attr=False)
self.weight = self.create_parameter(
shape=[self.embedding_size, self.class_num],
is_bias=False,
default_initializer=paddle.nn.initializer.XavierNormal())
def forward(self, input, label=None):
input_norm = paddle.sqrt(
paddle.sum(paddle.square(input), axis=1, keepdim=True))
input = paddle.divide(input, input_norm)
weight = self.fc.weight
weight_norm = paddle.sqrt(
paddle.sum(paddle.square(weight), axis=0, keepdim=True))
weight = paddle.divide(weight, weight_norm)
paddle.sum(paddle.square(self.weight), axis=0, keepdim=True))
weight = paddle.divide(self.weight, weight_norm)
cos = paddle.matmul(input, weight)
if not self.training or label is None:

View File

@ -26,20 +26,19 @@ class CircleMargin(nn.Layer):
self.embedding_size = embedding_size
self.class_num = class_num
weight_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.XavierNormal())
self.fc = paddle.nn.Linear(
self.embedding_size, self.class_num, weight_attr=weight_attr)
self.weight = self.create_parameter(
shape=[self.embedding_size, self.class_num],
is_bias=False,
default_initializer=paddle.nn.initializer.XavierNormal())
def forward(self, input, label):
feat_norm = paddle.sqrt(
paddle.sum(paddle.square(input), axis=1, keepdim=True))
input = paddle.divide(input, feat_norm)
weight = self.fc.weight
weight_norm = paddle.sqrt(
paddle.sum(paddle.square(weight), axis=0, keepdim=True))
weight = paddle.divide(weight, weight_norm)
paddle.sum(paddle.square(self.weight), axis=0, keepdim=True))
weight = paddle.divide(self.weight, weight_norm)
logits = paddle.matmul(input, weight)
if not self.training or label is None:
@ -49,9 +48,9 @@ class CircleMargin(nn.Layer):
alpha_n = paddle.clip(logits.detach() + self.margin, min=0.)
delta_p = 1 - self.margin
delta_n = self.margin
m_hot = F.one_hot(label.reshape([-1]), num_classes=logits.shape[1])
logits_p = alpha_p * (logits - delta_p)
logits_n = alpha_n * (logits - delta_n)
pre_logits = logits_p * m_hot + logits_n * (1 - m_hot)

View File

@ -25,13 +25,10 @@ class CosMargin(paddle.nn.Layer):
self.embedding_size = embedding_size
self.class_num = class_num
weight_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.XavierNormal())
self.fc = nn.Linear(
self.embedding_size,
self.class_num,
weight_attr=weight_attr,
bias_attr=False)
self.weight = self.create_parameter(
shape=[self.embedding_size, self.class_num],
is_bias=False,
default_initializer=paddle.nn.initializer.XavierNormal())
def forward(self, input, label):
label.stop_gradient = True
@ -40,15 +37,14 @@ class CosMargin(paddle.nn.Layer):
paddle.sum(paddle.square(input), axis=1, keepdim=True))
input = paddle.divide(input, input_norm)
weight = self.fc.weight
weight_norm = paddle.sqrt(
paddle.sum(paddle.square(weight), axis=0, keepdim=True))
weight = paddle.divide(weight, weight_norm)
paddle.sum(paddle.square(self.weight), axis=0, keepdim=True))
weight = paddle.divide(self.weight, weight_norm)
cos = paddle.matmul(input, weight)
if not self.training or label is None:
return cos
cos_m = cos - self.margin
one_hot = paddle.nn.functional.one_hot(label, self.class_num)

View File

@ -0,0 +1,148 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 100
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
eval_mode: retrieval
use_dali: False
to_static: False
# model architecture
Arch:
name: RecModel
infer_output_key: features
infer_add_softmax: False
Backbone:
name: PPLCNet_x2_5
pretrained: True
use_ssld: True
BackboneStopLayer:
name: flatten_0
Neck:
name: FC
embedding_size: 1280
class_num: 512
Head:
name: ArcMargin
embedding_size: 512
class_num: 185341
margin: 0.2
scale: 30
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.04
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00001
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/train_reg_all_data.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 256
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
Query:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/
cls_label_path: ./dataset/Aliproduct/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Gallery:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/
cls_label_path: ./dataset/Aliproduct/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Metric:
Eval:
- Recallk:
topk: [1, 5]

View File

@ -34,9 +34,8 @@ Optimizer:
momentum: 0.9
lr:
name: Piecewise
learning_rate: 0.01
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
values: [0.01, 0.001, 0.0001, 0.00001]
regularizer:
name: 'L2'
coeff: 0.0001

View File

@ -0,0 +1,131 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# training model under @to_static
to_static: False
# model architecture
Arch:
name: CSPDarkNet53
class_num: 1000
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Piecewise
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 256
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 288
- CropImage:
size: 256
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 288
- CropImage:
size: 256
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -22,25 +22,27 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token pos_embed dist_token
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 1e-3
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
@ -55,17 +57,38 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: True
loader:
@ -83,6 +106,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: False
loader:
@ -108,6 +133,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -122,9 +149,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -22,25 +22,27 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token pos_embed dist_token
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 1e-3
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
@ -54,18 +56,39 @@ DataLoader:
to_rgb: True
channel_first: False
- RandCropImage:
size: 384
size: 384
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: True
loader:
@ -82,7 +105,9 @@ DataLoader:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 426
resize_short: 438
interpolation: bicubic
backend: pil
- CropImage:
size: 384
- NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: False
loader:
@ -107,7 +132,9 @@ Infer:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 426
resize_short: 438
interpolation: bicubic
backend: pil
- CropImage:
size: 384
- NormalizeImage:
@ -122,9 +149,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -22,25 +22,27 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token pos_embed dist_token
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 1e-3
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
@ -55,17 +57,38 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: True
loader:
@ -83,6 +106,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: False
loader:
@ -108,6 +133,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -122,9 +149,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -22,25 +22,27 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token pos_embed dist_token
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 1e-3
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
@ -54,18 +56,39 @@ DataLoader:
to_rgb: True
channel_first: False
- RandCropImage:
size: 384
size: 384
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: True
loader:
@ -82,7 +105,9 @@ DataLoader:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 426
resize_short: 438
interpolation: bicubic
backend: pil
- CropImage:
size: 384
- NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: False
loader:
@ -107,7 +132,9 @@ Infer:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 426
resize_short: 438
interpolation: bicubic
backend: pil
- CropImage:
size: 384
- NormalizeImage:
@ -122,9 +149,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -22,25 +22,27 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token pos_embed dist_token
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 1e-3
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
@ -55,17 +57,38 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: True
loader:
@ -83,6 +106,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: False
loader:
@ -108,6 +133,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -122,9 +149,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -22,25 +22,27 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token pos_embed dist_token
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 1e-3
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
@ -55,17 +57,38 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: True
loader:
@ -83,6 +106,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: False
loader:
@ -108,6 +133,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -122,9 +149,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -22,25 +22,27 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token pos_embed dist_token
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 1e-3
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
@ -55,17 +57,38 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: True
loader:
@ -83,6 +106,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: False
loader:
@ -108,6 +133,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -122,9 +149,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -22,25 +22,27 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token pos_embed dist_token
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 1e-3
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
DataLoader:
@ -55,17 +57,38 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: True
loader:
@ -83,6 +106,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 256
drop_last: False
shuffle: False
loader:
@ -108,6 +133,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -122,9 +149,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,129 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 1000
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 360
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNet_x0_25
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00003
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,129 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 1000
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 360
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNet_x0_35
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00003
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,129 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 1000
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 360
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNet_x0_5
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00003
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,129 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 1000
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 360
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNet_x0_75
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00003
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,129 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 1000
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 360
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNet_x1_0
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00003
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,129 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 1000
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 360
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNet_x1_5
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,129 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 1000
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 360
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNet_x2_0
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,130 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
class_num: 1000
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 360
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
# model architecture
Arch:
name: PPLCNet_x2_5
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.8
warmup_epoch: 5
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- AutoAugment:
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 512
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: docs/images/whl/demo.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: Topk
topk: 5
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -24,24 +24,28 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 20
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 384
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -84,7 +110,11 @@ DataLoader:
to_rgb: True
channel_first: False
- ResizeImage:
size: [384, 384]
resize_short: 438
interpolation: bicubic
backend: pil
- CropImage:
size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
@ -92,7 +122,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -107,7 +137,11 @@ Infer:
to_rgb: True
channel_first: False
- ResizeImage:
size: [384, 384]
resize_short: 438
interpolation: bicubic
backend: pil
- CropImage:
size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
@ -120,9 +154,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -24,24 +24,28 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 20
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +111,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +122,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +138,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +154,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -24,24 +24,28 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 20
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 384
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -84,7 +110,11 @@ DataLoader:
to_rgb: True
channel_first: False
- ResizeImage:
size: [384, 384]
resize_short: 438
interpolation: bicubic
backend: pil
- CropImage:
size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
@ -92,7 +122,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -107,7 +137,11 @@ Infer:
to_rgb: True
channel_first: False
- ResizeImage:
size: [384, 384]
resize_short: 438
interpolation: bicubic
backend: pil
- CropImage:
size: 384
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
@ -120,9 +154,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -24,24 +24,28 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 20
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +111,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +122,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +138,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +154,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -24,24 +24,28 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 20
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +111,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +122,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +138,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +154,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -24,24 +24,28 @@ Arch:
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 20
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +111,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +122,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +138,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +154,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -20,28 +20,34 @@ Global:
Arch:
name: alt_gvt_base
class_num: 1000
drop_rate: 0.0
drop_path_rate: 0.3
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +113,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +140,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +156,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -20,28 +20,34 @@ Global:
Arch:
name: alt_gvt_large
class_num: 1000
drop_rate: 0.0
drop_path_rate: 0.5
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +113,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +140,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +156,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -20,28 +20,34 @@ Global:
Arch:
name: alt_gvt_small
class_num: 1000
drop_rate: 0.0
drop_path_rate: 0.2
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +113,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +140,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +156,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -20,28 +20,34 @@ Global:
Arch:
name: pcpvt_base
class_num: 1000
drop_rate: 0.0
drop_path_rate: 0.3
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +113,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +140,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +156,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -20,28 +20,34 @@ Global:
Arch:
name: pcpvt_large
class_num: 1000
drop_rate: 0.0
drop_path_rate: 0.5
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +113,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +140,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +156,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -7,7 +7,7 @@ Global:
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 120
epochs: 300
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
@ -20,28 +20,34 @@ Global:
Arch:
name: pcpvt_small
class_num: 1000
drop_rate: 0.0
drop_path_rate: 0.2
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
- MixCELoss:
weight: 1.0
epsilon: 0.1
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
name: AdamW
beta1: 0.9
beta2: 0.999
epsilon: 1e-8
weight_decay: 0.05
no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
one_dim_param_no_weight_decay: True
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
name: Cosine
learning_rate: 5e-4
eta_min: 1e-5
warmup_epoch: 5
warmup_start_lr: 1e-6
# data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
channel_first: False
- RandCropImage:
size: 224
interpolation: bicubic
backend: pil
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
batch_transform_ops:
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: True
loader:
@ -85,6 +113,8 @@ DataLoader:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
batch_size: 128
drop_last: False
shuffle: False
loader:
@ -110,6 +140,8 @@ Infer:
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: bicubic
backend: pil
- CropImage:
size: 224
- NormalizeImage:
@ -124,9 +156,6 @@ Infer:
class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -54,7 +54,7 @@ Optimizer:
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.01
learning_rate: 0.04
regularizer:
name: 'L2'
coeff: 0.0001
@ -84,10 +84,10 @@ DataLoader:
- RandomErasing:
EPSILON: 0.5
sampler:
name: DistributedRandomIdentitySampler
name: PKSampler
batch_size: 128
num_instances: 2
drop_last: False
sample_per_id: 2
drop_last: True
loader:
num_workers: 6
@ -97,7 +97,7 @@ DataLoader:
dataset:
name: LogoDataset
image_root: "dataset/LogoDet-3K-crop/val/"
cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+query.txt"
cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+val.txt"
transform_ops:
- DecodeImage:
to_rgb: True
@ -122,7 +122,7 @@ DataLoader:
dataset:
name: LogoDataset
image_root: "dataset/LogoDet-3K-crop/train/"
cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+gallery.txt"
cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+train.txt"
transform_ops:
- DecodeImage:
to_rgb: True

View File

@ -54,7 +54,7 @@ Optimizer:
momentum: 0.9
lr:
name: MultiStepDecay
learning_rate: 0.01
learning_rate: 0.04
milestones: [30, 60, 70, 80, 90, 100]
gamma: 0.5
verbose: False
@ -90,10 +90,10 @@ DataLoader:
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
name: PKSampler
batch_size: 64
num_instances: 2
drop_last: False
sample_per_id: 2
drop_last: True
shuffle: True
loader:
num_workers: 4

View File

@ -51,12 +51,8 @@ Optimizer:
name: Momentum
momentum: 0.9
lr:
name: MultiStepDecay
name: Cosine
learning_rate: 0.01
milestones: [30, 60, 70, 80, 90, 100, 120, 140]
gamma: 0.5
verbose: False
last_epoch: -1
regularizer:
name: 'L2'
coeff: 0.0005
@ -91,9 +87,9 @@ DataLoader:
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
name: PKSampler
batch_size: 128
num_instances: 2
sample_per_id: 2
drop_last: False
shuffle: True
loader:

View File

@ -53,8 +53,7 @@ Optimizer:
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.01
last_epoch: -1
learning_rate: 0.04
regularizer:
name: 'L2'
coeff: 0.0005
@ -89,10 +88,10 @@ DataLoader:
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
name: PKSampler
batch_size: 128
num_instances: 2
drop_last: False
sample_per_id: 2
drop_last: True
shuffle: True
loader:
num_workers: 6

View File

@ -0,0 +1,129 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: ./output/
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 10
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: ./inference
use_multilabel: True
# model architecture
Arch:
name: MobileNetV1
class_num: 33
pretrained: True
# loss function config for traing/eval process
Loss:
Train:
- MultiLabelLoss:
weight: 1.0
Eval:
- MultiLabelLoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.1
regularizer:
name: 'L2'
coeff: 0.00004
# data loader for train and eval
DataLoader:
Train:
dataset:
name: MultiLabelDataset
image_root: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/
cls_label_path: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/multilabel_train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: True
loader:
num_workers: 4
use_shared_memory: True
Eval:
dataset:
name: MultiLabelDataset
image_root: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/
cls_label_path: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/multilabel_test_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 256
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True
Infer:
infer_imgs: ./deploy/images/0517_2715693311.jpg
batch_size: 10
transforms:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
PostProcess:
name: MultiLabelTopk
topk: 5
class_id_map_file: None
Metric:
Train:
- HammingDistance:
- AccuracyScore:
Eval:
- HammingDistance:
- AccuracyScore:

View File

@ -0,0 +1,135 @@
# global configs
Global:
checkpoints: null
pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.2_pretrained.pdparams"
output_dir: "./output_vehicle_cls_prune/"
device: "gpu"
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 160
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
Slim:
prune:
name: fpgm
pruned_ratio: 0.3
# model architecture
Arch:
name: "RecModel"
infer_output_key: "features"
infer_add_softmax: False
Backbone:
name: "ResNet50_last_stage_stride1"
pretrained: True
BackboneStopLayer:
name: "adaptive_avg_pool2d_0"
Neck:
name: "VehicleNeck"
in_channels: 2048
out_channels: 512
Head:
name: "ArcMargin"
embedding_size: 512
class_num: 431
margin: 0.15
scale: 32
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
- SupConLoss:
weight: 1.0
views: 2
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.01
regularizer:
name: 'L2'
coeff: 0.0005
# data loader for train and eval
DataLoader:
Train:
dataset:
name: "CompCars"
image_root: "./dataset/CompCars/image/"
label_root: "./dataset/CompCars/label/"
bbox_crop: True
cls_label_path: "./dataset/CompCars/train_test_split/classification/train_label.txt"
transform_ops:
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- AugMix:
prob: 0.5
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.4
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: PKSampler
batch_size: 128
sample_per_id: 2
drop_last: False
shuffle: True
loader:
num_workers: 8
use_shared_memory: True
Eval:
dataset:
name: "CompCars"
image_root: "./dataset/CompCars/image/"
label_root: "./dataset/CompCars/label/"
cls_label_path: "./dataset/CompCars/train_test_split/classification/test_label.txt"
bbox_crop: True
transform_ops:
- ResizeImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 128
drop_last: False
shuffle: False
loader:
num_workers: 8
use_shared_memory: True
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -0,0 +1,134 @@
# global configs
Global:
checkpoints: null
pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.2_pretrained.pdparams"
output_dir: "./output_vehicle_cls_pact/"
device: "gpu"
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 80
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
Slim:
quant:
name: pact
# model architecture
Arch:
name: "RecModel"
infer_output_key: "features"
infer_add_softmax: False
Backbone:
name: "ResNet50_last_stage_stride1"
pretrained: True
BackboneStopLayer:
name: "adaptive_avg_pool2d_0"
Neck:
name: "VehicleNeck"
in_channels: 2048
out_channels: 512
Head:
name: "ArcMargin"
embedding_size: 512
class_num: 431
margin: 0.15
scale: 32
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
- SupConLoss:
weight: 1.0
views: 2
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.001
regularizer:
name: 'L2'
coeff: 0.0005
# data loader for train and eval
DataLoader:
Train:
dataset:
name: "CompCars"
image_root: "./dataset/CompCars/image/"
label_root: "./dataset/CompCars/label/"
bbox_crop: True
cls_label_path: "./dataset/CompCars/train_test_split/classification/train_label.txt"
transform_ops:
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- AugMix:
prob: 0.5
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.4
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: PKSampler
batch_size: 64
sample_per_id: 2
drop_last: False
shuffle: True
loader:
num_workers: 8
use_shared_memory: True
Eval:
dataset:
name: "CompCars"
image_root: "./dataset/CompCars/image/"
label_root: "./dataset/CompCars/label/"
cls_label_path: "./dataset/CompCars/train_test_split/classification/test_label.txt"
bbox_crop: True
transform_ops:
- ResizeImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 128
drop_last: False
shuffle: False
loader:
num_workers: 8
use_shared_memory: True
Metric:
Train:
- TopkAcc:
topk: [1, 5]
Eval:
- TopkAcc:
topk: [1, 5]

View File

@ -1,8 +1,8 @@
# global configs
Global:
checkpoints: null
pretrained_model: null
output_dir: "./output/"
pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.1_pretrained.pdparams"
output_dir: "./output_vehicle_reid_prune/"
device: "gpu"
save_interval: 1
eval_during_train: True
@ -61,7 +61,6 @@ Optimizer:
lr:
name: Cosine
learning_rate: 0.01
last_epoch: -1
regularizer:
name: 'L2'
coeff: 0.0005
@ -96,9 +95,9 @@ DataLoader:
mean: [0., 0., 0.]
sampler:
name: DistributedRandomIdentitySampler
name: PKSampler
batch_size: 128
num_instances: 2
sample_per_id: 2
drop_last: False
shuffle: True
loader:

View File

@ -0,0 +1,161 @@
# global configs
Global:
checkpoints: null
pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.1_pretrained.pdparams"
output_dir: "./output_vehicle_reid_pact/"
device: "gpu"
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 40
print_batch_step: 10
use_visualdl: False
# used for static mode and model export
image_shape: [3, 224, 224]
save_inference_dir: "./inference"
eval_mode: "retrieval"
# for quantizaiton or prune model
Slim:
## for prune
quant:
name: pact
# model architecture
Arch:
name: "RecModel"
infer_output_key: "features"
infer_add_softmax: False
Backbone:
name: "ResNet50_last_stage_stride1"
pretrained: True
BackboneStopLayer:
name: "adaptive_avg_pool2d_0"
Neck:
name: "VehicleNeck"
in_channels: 2048
out_channels: 512
Head:
name: "ArcMargin"
embedding_size: 512
class_num: 30671
margin: 0.15
scale: 32
# loss function config for traing/eval process
Loss:
Train:
- CELoss:
weight: 1.0
- SupConLoss:
weight: 1.0
views: 2
Eval:
- CELoss:
weight: 1.0
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.001
regularizer:
name: 'L2'
coeff: 0.0005
# data loader for train and eval
DataLoader:
Train:
dataset:
name: "VeriWild"
image_root: "./dataset/VeRI-Wild/images/"
cls_label_path: "./dataset/VeRI-Wild/train_test_split/train_list_start0.txt"
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- RandFlipImage:
flip_code: 1
- AugMix:
prob: 0.5
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.5
sl: 0.02
sh: 0.4
r1: 0.3
mean: [0., 0., 0.]
sampler:
name: PKSampler
batch_size: 64
sample_per_id: 2
drop_last: False
shuffle: True
loader:
num_workers: 6
use_shared_memory: True
Eval:
Query:
dataset:
name: "VeriWild"
image_root: "./dataset/VeRI-Wild/images"
cls_label_path: "./dataset/VeRI-Wild/train_test_split/test_3000_id_query.txt"
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 6
use_shared_memory: True
Gallery:
dataset:
name: "VeriWild"
image_root: "./dataset/VeRI-Wild/images"
cls_label_path: "./dataset/VeRI-Wild/train_test_split/test_3000_id.txt"
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 6
use_shared_memory: True
Metric:
Eval:
- Recallk:
topk: [1, 5]
- mAP: {}

View File

@ -26,9 +26,12 @@ from ppcls.data.dataloader.common_dataset import create_operators
from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
from ppcls.data.dataloader.logo_dataset import LogoDataset
from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
from ppcls.data.dataloader.mix_dataset import MixDataset
# sampler
from ppcls.data.dataloader.DistributedRandomIdentitySampler import DistributedRandomIdentitySampler
from ppcls.data.dataloader.pk_sampler import PKSampler
from ppcls.data.dataloader.mix_sampler import MixSampler
from ppcls.data import preprocess
from ppcls.data.preprocess import transform

View File

@ -0,0 +1,9 @@
from ppcls.data.dataloader.imagenet_dataset import ImageNetDataset
from ppcls.data.dataloader.multilabel_dataset import MultiLabelDataset
from ppcls.data.dataloader.common_dataset import create_operators
from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
from ppcls.data.dataloader.logo_dataset import LogoDataset
from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
from ppcls.data.dataloader.mix_dataset import MixDataset
from ppcls.data.dataloader.mix_sampler import MixSampler
from ppcls.data.dataloader.pk_sampler import PKSampler

View File

@ -0,0 +1,49 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import numpy as np
import os
from paddle.io import Dataset
from .. import dataloader
class MixDataset(Dataset):
def __init__(self, datasets_config):
super().__init__()
self.dataset_list = []
start_idx = 0
end_idx = 0
for config_i in datasets_config:
dataset_name = config_i.pop('name')
dataset = getattr(dataloader, dataset_name)(**config_i)
end_idx += len(dataset)
self.dataset_list.append([end_idx, start_idx, dataset])
start_idx = end_idx
self.length = end_idx
def __getitem__(self, idx):
for dataset_i in self.dataset_list:
if dataset_i[0] > idx:
dataset_i_idx = idx - dataset_i[1]
return dataset_i[2][dataset_i_idx]
def __len__(self):
return self.length
def get_dataset_list(self):
return self.dataset_list

View File

@ -0,0 +1,79 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from paddle.io import DistributedBatchSampler, Sampler
from ppcls.utils import logger
from ppcls.data.dataloader.mix_dataset import MixDataset
from ppcls.data import dataloader
class MixSampler(DistributedBatchSampler):
def __init__(self, dataset, batch_size, sample_configs, iter_per_epoch):
super().__init__(dataset, batch_size)
assert isinstance(dataset,
MixDataset), "MixSampler only support MixDataset"
self.sampler_list = []
self.batch_size = batch_size
self.start_list = []
self.length = iter_per_epoch
dataset_list = dataset.get_dataset_list()
batch_size_left = self.batch_size
self.iter_list = []
for i, config_i in enumerate(sample_configs):
self.start_list.append(dataset_list[i][1])
sample_method = config_i.pop("name")
ratio_i = config_i.pop("ratio")
if i < len(sample_configs) - 1:
batch_size_i = int(self.batch_size * ratio_i)
batch_size_left -= batch_size_i
else:
batch_size_i = batch_size_left
assert batch_size_i <= len(dataset_list[i][2])
config_i["batch_size"] = batch_size_i
if sample_method == "DistributedBatchSampler":
sampler_i = DistributedBatchSampler(dataset_list[i][2],
**config_i)
else:
sampler_i = getattr(dataloader, sample_method)(
dataset_list[i][2], **config_i)
self.sampler_list.append(sampler_i)
self.iter_list.append(iter(sampler_i))
self.length += len(dataset_list[i][2]) * ratio_i
self.iter_counter = 0
def __iter__(self):
while self.iter_counter < self.length:
batch = []
for i, iter_i in enumerate(self.iter_list):
batch_i = next(iter_i, None)
if batch_i is None:
iter_i = iter(self.sampler_list[i])
self.iter_list[i] = iter_i
batch_i = next(iter_i, None)
assert batch_i is not None, "dataset {} return None".format(
i)
batch += [idx + self.start_list[i] for idx in batch_i]
if len(batch) == self.batch_size:
self.iter_counter += 1
yield batch
else:
logger.info("Some dataset reaches end")
self.iter_counter = 0
def __len__(self):
return self.length

View File

@ -33,7 +33,7 @@ class MultiLabelDataset(CommonDataset):
with open(self._cls_path) as fd:
lines = fd.readlines()
for l in lines:
l = l.strip().split(" ")
l = l.strip().split("\t")
self.images.append(os.path.join(self._img_root, l[0]))
labels = l[1].split(',')
@ -44,13 +44,14 @@ class MultiLabelDataset(CommonDataset):
def __getitem__(self, idx):
try:
img = cv2.imread(self.images[idx])
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
with open(self.images[idx], 'rb') as f:
img = f.read()
if self._transform_ops:
img = transform(img, self._transform_ops)
img = img.transpose((2, 0, 1))
label = np.array(self.labels[idx]).astype("float32")
return (img, label)
except Exception as ex:
logger.error("Exception occured when parse line: {} with msg: {}".
format(self.images[idx], ex))

View File

@ -0,0 +1,105 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from collections import defaultdict
import numpy as np
import random
from paddle.io import DistributedBatchSampler
from ppcls.utils import logger
class PKSampler(DistributedBatchSampler):
"""
First, randomly sample P identities.
Then for each identity randomly sample K instances.
Therefore batch size is P*K, and the sampler called PKSampler.
Args:
dataset (paddle.io.Dataset): list of (img_path, pid, cam_id).
sample_per_id(int): number of instances per identity in a batch.
batch_size (int): number of examples in a batch.
shuffle(bool): whether to shuffle indices order before generating
batch indices. Default False.
"""
def __init__(self,
dataset,
batch_size,
sample_per_id,
shuffle=True,
drop_last=True,
sample_method="sample_avg_prob"):
super().__init__(
dataset, batch_size, shuffle=shuffle, drop_last=drop_last)
assert batch_size % sample_per_id == 0, \
"PKSampler configs error, Sample_per_id must be a divisor of batch_size."
assert hasattr(self.dataset,
"labels"), "Dataset must have labels attribute."
self.sample_per_label = sample_per_id
self.label_dict = defaultdict(list)
self.sample_method = sample_method
for idx, label in enumerate(self.dataset.labels):
self.label_dict[label].append(idx)
self.label_list = list(self.label_dict)
assert len(self.label_list) * self.sample_per_label > self.batch_size, \
"batch size should be smaller than "
if self.sample_method == "id_avg_prob":
self.prob_list = np.array([1 / len(self.label_list)] *
len(self.label_list))
elif self.sample_method == "sample_avg_prob":
counter = []
for label_i in self.label_list:
counter.append(len(self.label_dict[label_i]))
self.prob_list = np.array(counter) / sum(counter)
else:
logger.error(
"PKSampler only support id_avg_prob and sample_avg_prob sample method, "
"but receive {}.".format(self.sample_method))
diff = np.abs(sum(self.prob_list) - 1)
if diff > 0.00000001:
self.prob_list[-1] = 1 - sum(self.prob_list[:-1])
if self.prob_list[-1] > 1 or self.prob_list[-1] < 0:
logger.error("PKSampler prob list error")
else:
logger.info(
"PKSampler: sum of prob list not equal to 1, diff is {}, change the last prob".format(diff)
)
def __iter__(self):
label_per_batch = self.batch_size // self.sample_per_label
for _ in range(len(self)):
batch_index = []
batch_label_list = np.random.choice(
self.label_list,
size=label_per_batch,
replace=False,
p=self.prob_list)
for label_i in batch_label_list:
label_i_indexes = self.label_dict[label_i]
if self.sample_per_label <= len(label_i_indexes):
batch_index.extend(
np.random.choice(
label_i_indexes,
size=self.sample_per_label,
replace=False))
else:
batch_index.extend(
np.random.choice(
label_i_indexes,
size=self.sample_per_label,
replace=True))
if not self.drop_last or len(batch_index) == self.batch_size:
yield batch_index

View File

@ -16,7 +16,7 @@ import importlib
from . import topk
from .topk import Topk
from .topk import Topk, MultiLabelTopk
def build_postprocess(config):

View File

@ -45,15 +45,17 @@ class Topk(object):
class_id_map = None
return class_id_map
def __call__(self, x, file_names=None):
def __call__(self, x, file_names=None, multilabel=False):
assert isinstance(x, paddle.Tensor)
if file_names is not None:
assert x.shape[0] == len(file_names)
x = F.softmax(x, axis=-1)
x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
x = x.numpy()
y = []
for idx, probs in enumerate(x):
index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32")
index = probs.argsort(axis=0)[-self.topk:][::-1].astype(
"int32") if not multilabel else np.where(
probs >= 0.5)[0].astype("int32")
clas_id_list = []
score_list = []
label_name_list = []
@ -73,3 +75,11 @@ class Topk(object):
result["label_names"] = label_name_list
y.append(result)
return y
class MultiLabelTopk(Topk):
def __init__(self, topk=1, class_id_map_file=None):
super().__init__()
def __call__(self, x, file_names=None):
return super().__call__(x, file_names, multilabel=True)

View File

@ -14,6 +14,7 @@
from ppcls.data.preprocess.ops.autoaugment import ImageNetPolicy as RawImageNetPolicy
from ppcls.data.preprocess.ops.randaugment import RandAugment as RawRandAugment
from ppcls.data.preprocess.ops.timm_autoaugment import RawTimmAutoAugment
from ppcls.data.preprocess.ops.cutout import Cutout
from ppcls.data.preprocess.ops.hide_and_seek import HideAndSeek
@ -29,9 +30,8 @@ from ppcls.data.preprocess.ops.operators import NormalizeImage
from ppcls.data.preprocess.ops.operators import ToCHWImage
from ppcls.data.preprocess.ops.operators import AugMix
from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, FmixOperator
from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator
import six
import numpy as np
from PIL import Image
@ -45,21 +45,16 @@ def transform(data, ops=[]):
class AutoAugment(RawImageNetPolicy):
""" ImageNetPolicy wrapper to auto fit different img types """
def __init__(self, *args, **kwargs):
if six.PY2:
super(AutoAugment, self).__init__(*args, **kwargs)
else:
super().__init__(*args, **kwargs)
super().__init__(*args, **kwargs)
def __call__(self, img):
if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
if six.PY2:
img = super(AutoAugment, self).__call__(img)
else:
img = super().__call__(img)
img = super().__call__(img)
if isinstance(img, Image.Image):
img = np.asarray(img)
@ -69,21 +64,35 @@ class AutoAugment(RawImageNetPolicy):
class RandAugment(RawRandAugment):
""" RandAugment wrapper to auto fit different img types """
def __init__(self, *args, **kwargs):
if six.PY2:
super(RandAugment, self).__init__(*args, **kwargs)
else:
super().__init__(*args, **kwargs)
super().__init__(*args, **kwargs)
def __call__(self, img):
if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
if six.PY2:
img = super(RandAugment, self).__call__(img)
else:
img = super().__call__(img)
img = super().__call__(img)
if isinstance(img, Image.Image):
img = np.asarray(img)
return img
class TimmAutoAugment(RawTimmAutoAugment):
""" TimmAutoAugment wrapper to auto fit different img tyeps. """
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __call__(self, img):
if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
img = super().__call__(img)
if isinstance(img, Image.Image):
img = np.asarray(img)

View File

@ -16,13 +16,17 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import random
import numpy as np
from ppcls.utils import logger
from ppcls.data.preprocess.ops.fmix import sample_mask
class BatchOperator(object):
""" BatchOperator """
def __init__(self, *args, **kwargs):
pass
@ -46,9 +50,20 @@ class BatchOperator(object):
class MixupOperator(BatchOperator):
""" Mixup operator """
def __init__(self, alpha=0.2):
assert alpha > 0., \
'parameter alpha[%f] should > 0.0' % (alpha)
def __init__(self, alpha: float=1.):
"""Build Mixup operator
Args:
alpha (float, optional): The parameter alpha of mixup. Defaults to 1..
Raises:
Exception: The value of parameter is illegal.
"""
if alpha <= 0:
raise Exception(
f"Parameter \"alpha\" of Mixup should be greater than 0. \"alpha\": {alpha}."
)
self._alpha = alpha
def __call__(self, batch):
@ -62,9 +77,20 @@ class MixupOperator(BatchOperator):
class CutmixOperator(BatchOperator):
""" Cutmix operator """
def __init__(self, alpha=0.2):
assert alpha > 0., \
'parameter alpha[%f] should > 0.0' % (alpha)
"""Build Cutmix operator
Args:
alpha (float, optional): The parameter alpha of cutmix. Defaults to 0.2.
Raises:
Exception: The value of parameter is illegal.
"""
if alpha <= 0:
raise Exception(
f"Parameter \"alpha\" of Cutmix should be greater than 0. \"alpha\": {alpha}."
)
self._alpha = alpha
def _rand_bbox(self, size, lam):
@ -72,8 +98,8 @@ class CutmixOperator(BatchOperator):
w = size[2]
h = size[3]
cut_rat = np.sqrt(1. - lam)
cut_w = np.int(w * cut_rat)
cut_h = np.int(h * cut_rat)
cut_w = int(w * cut_rat)
cut_h = int(h * cut_rat)
# uniform
cx = np.random.randint(w)
@ -101,6 +127,7 @@ class CutmixOperator(BatchOperator):
class FmixOperator(BatchOperator):
""" Fmix operator """
def __init__(self, alpha=1, decay_power=3, max_soft=0., reformulate=False):
self._alpha = alpha
self._decay_power = decay_power
@ -115,3 +142,42 @@ class FmixOperator(BatchOperator):
size, self._max_soft, self._reformulate)
imgs = mask * imgs + (1 - mask) * imgs[idx]
return list(zip(imgs, labels, labels[idx], [lam] * bs))
class OpSampler(object):
""" Sample a operator from """
def __init__(self, **op_dict):
"""Build OpSampler
Raises:
Exception: The parameter \"prob\" of operator(s) are be set error.
"""
if len(op_dict) < 1:
msg = f"ConfigWarning: No operator in \"OpSampler\". \"OpSampler\" has been skipped."
self.ops = {}
total_prob = 0
for op_name in op_dict:
param = op_dict[op_name]
if "prob" not in param:
msg = f"ConfigWarning: Parameter \"prob\" should be set when use operator in \"OpSampler\". The operator \"{op_name}\"'s prob has been set \"0\"."
logger.warning(msg)
prob = param.pop("prob", 0)
total_prob += prob
op = eval(op_name)(**param)
self.ops.update({op: prob})
if total_prob > 1:
msg = f"ConfigError: The total prob of operators in \"OpSampler\" should be less 1."
logger.error(msg)
raise Exception(msg)
# add "None Op" when total_prob < 1, "None Op" do nothing
self.ops[None] = 1 - total_prob
def __call__(self, batch):
op = random.choices(
list(self.ops.keys()), weights=list(self.ops.values()), k=1)[0]
# return batch directly when None Op
return op(batch) if op else batch

View File

@ -19,15 +19,62 @@ from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from functools import partial
import six
import math
import random
import cv2
import numpy as np
from PIL import Image
from paddle.vision.transforms import ColorJitter as RawColorJitter
from .autoaugment import ImageNetPolicy
from .functional import augmentations
from ppcls.utils import logger
class UnifiedResize(object):
def __init__(self, interpolation=None, backend="cv2"):
_cv2_interp_from_str = {
'nearest': cv2.INTER_NEAREST,
'bilinear': cv2.INTER_LINEAR,
'area': cv2.INTER_AREA,
'bicubic': cv2.INTER_CUBIC,
'lanczos': cv2.INTER_LANCZOS4
}
_pil_interp_from_str = {
'nearest': Image.NEAREST,
'bilinear': Image.BILINEAR,
'bicubic': Image.BICUBIC,
'box': Image.BOX,
'lanczos': Image.LANCZOS,
'hamming': Image.HAMMING
}
def _pil_resize(src, size, resample):
pil_img = Image.fromarray(src)
pil_img = pil_img.resize(size, resample)
return np.asarray(pil_img)
if backend.lower() == "cv2":
if isinstance(interpolation, str):
interpolation = _cv2_interp_from_str[interpolation.lower()]
# compatible with opencv < version 4.4.0
elif interpolation is None:
interpolation = cv2.INTER_LINEAR
self.resize_func = partial(cv2.resize, interpolation=interpolation)
elif backend.lower() == "pil":
if isinstance(interpolation, str):
interpolation = _pil_interp_from_str[interpolation.lower()]
self.resize_func = partial(_pil_resize, resample=interpolation)
else:
logger.warning(
f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
)
self.resize_func = cv2.resize
def __call__(self, src, size):
return self.resize_func(src, size)
class OperatorParamError(ValueError):
@ -67,8 +114,11 @@ class DecodeImage(object):
class ResizeImage(object):
""" resize image """
def __init__(self, size=None, resize_short=None, interpolation=-1):
self.interpolation = interpolation if interpolation >= 0 else None
def __init__(self,
size=None,
resize_short=None,
interpolation=None,
backend="cv2"):
if resize_short is not None and resize_short > 0:
self.resize_short = resize_short
self.w = None
@ -81,6 +131,9 @@ class ResizeImage(object):
raise OperatorParamError("invalid params for ReisizeImage for '\
'both 'size' and 'resize_short' are None")
self._resize_func = UnifiedResize(
interpolation=interpolation, backend=backend)
def __call__(self, img):
img_h, img_w = img.shape[:2]
if self.resize_short is not None:
@ -90,10 +143,7 @@ class ResizeImage(object):
else:
w = self.w
h = self.h
if self.interpolation is None:
return cv2.resize(img, (w, h))
else:
return cv2.resize(img, (w, h), interpolation=self.interpolation)
return self._resize_func(img, (w, h))
class CropImage(object):
@ -119,9 +169,12 @@ class CropImage(object):
class RandCropImage(object):
""" random crop image """
def __init__(self, size, scale=None, ratio=None, interpolation=-1):
self.interpolation = interpolation if interpolation >= 0 else None
def __init__(self,
size,
scale=None,
ratio=None,
interpolation=None,
backend="cv2"):
if type(size) is int:
self.size = (size, size) # (h, w)
else:
@ -130,6 +183,9 @@ class RandCropImage(object):
self.scale = [0.08, 1.0] if scale is None else scale
self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
self._resize_func = UnifiedResize(
interpolation=interpolation, backend=backend)
def __call__(self, img):
size = self.size
scale = self.scale
@ -155,10 +211,8 @@ class RandCropImage(object):
j = random.randint(0, img_h - h)
img = img[j:j + h, i:i + w, :]
if self.interpolation is None:
return cv2.resize(img, size)
else:
return cv2.resize(img, size, interpolation=self.interpolation)
return self._resize_func(img, size)
class RandFlipImage(object):
@ -313,3 +367,20 @@ class AugMix(object):
mixed = (1 - m) * image + m * mix
return mixed.astype(np.uint8)
class ColorJitter(RawColorJitter):
"""ColorJitter.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __call__(self, img):
if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
img = super()._apply_image(img)
if isinstance(img, Image.Image):
img = np.asarray(img)
return img

View File

@ -12,7 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#This code is based on https://github.com/zhunzhong07/Random-Erasing
#This code is adapted from https://github.com/zhunzhong07/Random-Erasing, and refer to Timm.
from functools import partial
import math
import random
@ -20,36 +22,69 @@ import random
import numpy as np
class Pixels(object):
def __init__(self, mode="const", mean=[0., 0., 0.]):
self._mode = mode
self._mean = mean
def __call__(self, h=224, w=224, c=3):
if self._mode == "rand":
return np.random.normal(size=(1, 1, 3))
elif self._mode == "pixel":
return np.random.normal(size=(h, w, c))
elif self._mode == "const":
return self._mean
else:
raise Exception(
"Invalid mode in RandomErasing, only support \"const\", \"rand\", \"pixel\""
)
class RandomErasing(object):
def __init__(self, EPSILON=0.5, sl=0.02, sh=0.4, r1=0.3,
mean=[0., 0., 0.]):
self.EPSILON = EPSILON
self.mean = mean
self.sl = sl
self.sh = sh
self.r1 = r1
"""RandomErasing.
"""
def __init__(self,
EPSILON=0.5,
sl=0.02,
sh=0.4,
r1=0.3,
mean=[0., 0., 0.],
attempt=100,
use_log_aspect=False,
mode='const'):
self.EPSILON = eval(EPSILON) if isinstance(EPSILON, str) else EPSILON
self.sl = eval(sl) if isinstance(sl, str) else sl
self.sh = eval(sh) if isinstance(sh, str) else sh
r1 = eval(r1) if isinstance(r1, str) else r1
self.r1 = (math.log(r1), math.log(1 / r1)) if use_log_aspect else (
r1, 1 / r1)
self.use_log_aspect = use_log_aspect
self.attempt = attempt
self.get_pixels = Pixels(mode, mean)
def __call__(self, img):
if random.uniform(0, 1) > self.EPSILON:
if random.random() > self.EPSILON:
return img
for _ in range(100):
for _ in range(self.attempt):
area = img.shape[0] * img.shape[1]
target_area = random.uniform(self.sl, self.sh) * area
aspect_ratio = random.uniform(self.r1, 1 / self.r1)
aspect_ratio = random.uniform(*self.r1)
if self.use_log_aspect:
aspect_ratio = math.exp(aspect_ratio)
h = int(round(math.sqrt(target_area * aspect_ratio)))
w = int(round(math.sqrt(target_area / aspect_ratio)))
if w < img.shape[1] and h < img.shape[0]:
pixels = self.get_pixels(h, w, img.shape[2])
x1 = random.randint(0, img.shape[0] - h)
y1 = random.randint(0, img.shape[1] - w)
if img.shape[0] == 3:
img[x1:x1 + h, y1:y1 + w, 0] = self.mean[0]
img[x1:x1 + h, y1:y1 + w, 1] = self.mean[1]
img[x1:x1 + h, y1:y1 + w, 2] = self.mean[2]
if img.shape[2] == 3:
img[x1:x1 + h, y1:y1 + w, :] = pixels
else:
img[0, x1:x1 + h, y1:y1 + w] = self.mean[1]
img[x1:x1 + h, y1:y1 + w, 0] = pixels[0]
return img
return img

View File

@ -0,0 +1,879 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This code implements is borrowed from Timm: https://github.com/rwightman/pytorch-image-models.
hacked together by / Copyright 2020 Ross Wightman
"""
import random
import math
import re
from PIL import Image, ImageOps, ImageEnhance, ImageChops
import PIL
import numpy as np
IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
_PIL_VER = tuple([int(x) for x in PIL.__version__.split('.')[:2]])
_FILL = (128, 128, 128)
# This signifies the max integer that the controller RNN could predict for the
# augmentation scheme.
_MAX_LEVEL = 10.
_HPARAMS_DEFAULT = dict(
translate_const=250,
img_mean=_FILL, )
_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
def _pil_interp(method):
if method == 'bicubic':
return Image.BICUBIC
elif method == 'lanczos':
return Image.LANCZOS
elif method == 'hamming':
return Image.HAMMING
else:
# default bilinear, do we want to allow nearest?
return Image.BILINEAR
def _interpolation(kwargs):
interpolation = kwargs.pop('resample', Image.BILINEAR)
if isinstance(interpolation, (list, tuple)):
return random.choice(interpolation)
else:
return interpolation
def _check_args_tf(kwargs):
if 'fillcolor' in kwargs and _PIL_VER < (5, 0):
kwargs.pop('fillcolor')
kwargs['resample'] = _interpolation(kwargs)
def shear_x(img, factor, **kwargs):
_check_args_tf(kwargs)
return img.transform(img.size, Image.AFFINE, (1, factor, 0, 0, 1, 0),
**kwargs)
def shear_y(img, factor, **kwargs):
_check_args_tf(kwargs)
return img.transform(img.size, Image.AFFINE, (1, 0, 0, factor, 1, 0),
**kwargs)
def translate_x_rel(img, pct, **kwargs):
pixels = pct * img.size[0]
_check_args_tf(kwargs)
return img.transform(img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
**kwargs)
def translate_y_rel(img, pct, **kwargs):
pixels = pct * img.size[1]
_check_args_tf(kwargs)
return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
**kwargs)
def translate_x_abs(img, pixels, **kwargs):
_check_args_tf(kwargs)
return img.transform(img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
**kwargs)
def translate_y_abs(img, pixels, **kwargs):
_check_args_tf(kwargs)
return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
**kwargs)
def rotate(img, degrees, **kwargs):
_check_args_tf(kwargs)
if _PIL_VER >= (5, 2):
return img.rotate(degrees, **kwargs)
elif _PIL_VER >= (5, 0):
w, h = img.size
post_trans = (0, 0)
rotn_center = (w / 2.0, h / 2.0)
angle = -math.radians(degrees)
matrix = [
round(math.cos(angle), 15),
round(math.sin(angle), 15),
0.0,
round(-math.sin(angle), 15),
round(math.cos(angle), 15),
0.0,
]
def transform(x, y, matrix):
(a, b, c, d, e, f) = matrix
return a * x + b * y + c, d * x + e * y + f
matrix[2], matrix[5] = transform(-rotn_center[0] - post_trans[0],
-rotn_center[1] - post_trans[1],
matrix)
matrix[2] += rotn_center[0]
matrix[5] += rotn_center[1]
return img.transform(img.size, Image.AFFINE, matrix, **kwargs)
else:
return img.rotate(degrees, resample=kwargs['resample'])
def auto_contrast(img, **__):
return ImageOps.autocontrast(img)
def invert(img, **__):
return ImageOps.invert(img)
def equalize(img, **__):
return ImageOps.equalize(img)
def solarize(img, thresh, **__):
return ImageOps.solarize(img, thresh)
def solarize_add(img, add, thresh=128, **__):
lut = []
for i in range(256):
if i < thresh:
lut.append(min(255, i + add))
else:
lut.append(i)
if img.mode in ("L", "RGB"):
if img.mode == "RGB" and len(lut) == 256:
lut = lut + lut + lut
return img.point(lut)
else:
return img
def posterize(img, bits_to_keep, **__):
if bits_to_keep >= 8:
return img
return ImageOps.posterize(img, bits_to_keep)
def contrast(img, factor, **__):
return ImageEnhance.Contrast(img).enhance(factor)
def color(img, factor, **__):
return ImageEnhance.Color(img).enhance(factor)
def brightness(img, factor, **__):
return ImageEnhance.Brightness(img).enhance(factor)
def sharpness(img, factor, **__):
return ImageEnhance.Sharpness(img).enhance(factor)
def _randomly_negate(v):
"""With 50% prob, negate the value"""
return -v if random.random() > 0.5 else v
def _rotate_level_to_arg(level, _hparams):
# range [-30, 30]
level = (level / _MAX_LEVEL) * 30.
level = _randomly_negate(level)
return level,
def _enhance_level_to_arg(level, _hparams):
# range [0.1, 1.9]
return (level / _MAX_LEVEL) * 1.8 + 0.1,
def _enhance_increasing_level_to_arg(level, _hparams):
# the 'no change' level is 1.0, moving away from that towards 0. or 2.0 increases the enhancement blend
# range [0.1, 1.9]
level = (level / _MAX_LEVEL) * .9
level = 1.0 + _randomly_negate(level)
return level,
def _shear_level_to_arg(level, _hparams):
# range [-0.3, 0.3]
level = (level / _MAX_LEVEL) * 0.3
level = _randomly_negate(level)
return level,
def _translate_abs_level_to_arg(level, hparams):
translate_const = hparams['translate_const']
level = (level / _MAX_LEVEL) * float(translate_const)
level = _randomly_negate(level)
return level,
def _translate_rel_level_to_arg(level, hparams):
# default range [-0.45, 0.45]
translate_pct = hparams.get('translate_pct', 0.45)
level = (level / _MAX_LEVEL) * translate_pct
level = _randomly_negate(level)
return level,
def _posterize_level_to_arg(level, _hparams):
# As per Tensorflow TPU EfficientNet impl
# range [0, 4], 'keep 0 up to 4 MSB of original image'
# intensity/severity of augmentation decreases with level
return int((level / _MAX_LEVEL) * 4),
def _posterize_increasing_level_to_arg(level, hparams):
# As per Tensorflow models research and UDA impl
# range [4, 0], 'keep 4 down to 0 MSB of original image',
# intensity/severity of augmentation increases with level
return 4 - _posterize_level_to_arg(level, hparams)[0],
def _posterize_original_level_to_arg(level, _hparams):
# As per original AutoAugment paper description
# range [4, 8], 'keep 4 up to 8 MSB of image'
# intensity/severity of augmentation decreases with level
return int((level / _MAX_LEVEL) * 4) + 4,
def _solarize_level_to_arg(level, _hparams):
# range [0, 256]
# intensity/severity of augmentation decreases with level
return int((level / _MAX_LEVEL) * 256),
def _solarize_increasing_level_to_arg(level, _hparams):
# range [0, 256]
# intensity/severity of augmentation increases with level
return 256 - _solarize_level_to_arg(level, _hparams)[0],
def _solarize_add_level_to_arg(level, _hparams):
# range [0, 110]
return int((level / _MAX_LEVEL) * 110),
LEVEL_TO_ARG = {
'AutoContrast': None,
'Equalize': None,
'Invert': None,
'Rotate': _rotate_level_to_arg,
# There are several variations of the posterize level scaling in various Tensorflow/Google repositories/papers
'Posterize': _posterize_level_to_arg,
'PosterizeIncreasing': _posterize_increasing_level_to_arg,
'PosterizeOriginal': _posterize_original_level_to_arg,
'Solarize': _solarize_level_to_arg,
'SolarizeIncreasing': _solarize_increasing_level_to_arg,
'SolarizeAdd': _solarize_add_level_to_arg,
'Color': _enhance_level_to_arg,
'ColorIncreasing': _enhance_increasing_level_to_arg,
'Contrast': _enhance_level_to_arg,
'ContrastIncreasing': _enhance_increasing_level_to_arg,
'Brightness': _enhance_level_to_arg,
'BrightnessIncreasing': _enhance_increasing_level_to_arg,
'Sharpness': _enhance_level_to_arg,
'SharpnessIncreasing': _enhance_increasing_level_to_arg,
'ShearX': _shear_level_to_arg,
'ShearY': _shear_level_to_arg,
'TranslateX': _translate_abs_level_to_arg,
'TranslateY': _translate_abs_level_to_arg,
'TranslateXRel': _translate_rel_level_to_arg,
'TranslateYRel': _translate_rel_level_to_arg,
}
NAME_TO_OP = {
'AutoContrast': auto_contrast,
'Equalize': equalize,
'Invert': invert,
'Rotate': rotate,
'Posterize': posterize,
'PosterizeIncreasing': posterize,
'PosterizeOriginal': posterize,
'Solarize': solarize,
'SolarizeIncreasing': solarize,
'SolarizeAdd': solarize_add,
'Color': color,
'ColorIncreasing': color,
'Contrast': contrast,
'ContrastIncreasing': contrast,
'Brightness': brightness,
'BrightnessIncreasing': brightness,
'Sharpness': sharpness,
'SharpnessIncreasing': sharpness,
'ShearX': shear_x,
'ShearY': shear_y,
'TranslateX': translate_x_abs,
'TranslateY': translate_y_abs,
'TranslateXRel': translate_x_rel,
'TranslateYRel': translate_y_rel,
}
class AugmentOp(object):
def __init__(self, name, prob=0.5, magnitude=10, hparams=None):
hparams = hparams or _HPARAMS_DEFAULT
self.aug_fn = NAME_TO_OP[name]
self.level_fn = LEVEL_TO_ARG[name]
self.prob = prob
self.magnitude = magnitude
self.hparams = hparams.copy()
self.kwargs = dict(
fillcolor=hparams['img_mean'] if 'img_mean' in hparams else _FILL,
resample=hparams['interpolation']
if 'interpolation' in hparams else _RANDOM_INTERPOLATION, )
# If magnitude_std is > 0, we introduce some randomness
# in the usually fixed policy and sample magnitude from a normal distribution
# with mean `magnitude` and std-dev of `magnitude_std`.
# NOTE This is my own hack, being tested, not in papers or reference impls.
self.magnitude_std = self.hparams.get('magnitude_std', 0)
def __call__(self, img):
if self.prob < 1.0 and random.random() > self.prob:
return img
magnitude = self.magnitude
if self.magnitude_std and self.magnitude_std > 0:
magnitude = random.gauss(magnitude, self.magnitude_std)
magnitude = min(_MAX_LEVEL, max(0, magnitude)) # clip to valid range
level_args = self.level_fn(
magnitude, self.hparams) if self.level_fn is not None else tuple()
return self.aug_fn(img, *level_args, **self.kwargs)
def auto_augment_policy_v0(hparams):
# ImageNet v0 policy from TPU EfficientNet impl, cannot find a paper reference.
policy = [
[('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
[('Color', 0.4, 9), ('Equalize', 0.6, 3)],
[('Color', 0.4, 1), ('Rotate', 0.6, 8)],
[('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
[('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
[('Color', 0.2, 0), ('Equalize', 0.8, 8)],
[('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
[('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
[('Color', 0.6, 1), ('Equalize', 1.0, 2)],
[('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
[('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
[('Color', 0.4, 7), ('Equalize', 0.6, 0)],
[('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)],
[('Solarize', 0.6, 8), ('Color', 0.6, 9)],
[('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
[('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
[('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
[('ShearY', 0.8, 0), ('Color', 0.6, 4)],
[('Color', 1.0, 0), ('Rotate', 0.6, 2)],
[('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
[('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
[('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
[('Posterize', 0.8, 2), ('Solarize', 0.6, 10)
], # This results in black image with Tpu posterize
[('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
[('Color', 0.8, 6), ('Rotate', 0.4, 5)],
]
pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
return pc
def auto_augment_policy_v0r(hparams):
# ImageNet v0 policy from TPU EfficientNet impl, with variation of Posterize used
# in Google research implementation (number of bits discarded increases with magnitude)
policy = [
[('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
[('Color', 0.4, 9), ('Equalize', 0.6, 3)],
[('Color', 0.4, 1), ('Rotate', 0.6, 8)],
[('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
[('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
[('Color', 0.2, 0), ('Equalize', 0.8, 8)],
[('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
[('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
[('Color', 0.6, 1), ('Equalize', 1.0, 2)],
[('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
[('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
[('Color', 0.4, 7), ('Equalize', 0.6, 0)],
[('PosterizeIncreasing', 0.4, 6), ('AutoContrast', 0.4, 7)],
[('Solarize', 0.6, 8), ('Color', 0.6, 9)],
[('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
[('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
[('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
[('ShearY', 0.8, 0), ('Color', 0.6, 4)],
[('Color', 1.0, 0), ('Rotate', 0.6, 2)],
[('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
[('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
[('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
[('PosterizeIncreasing', 0.8, 2), ('Solarize', 0.6, 10)],
[('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
[('Color', 0.8, 6), ('Rotate', 0.4, 5)],
]
pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
return pc
def auto_augment_policy_original(hparams):
# ImageNet policy from https://arxiv.org/abs/1805.09501
policy = [
[('PosterizeOriginal', 0.4, 8), ('Rotate', 0.6, 9)],
[('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
[('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
[('PosterizeOriginal', 0.6, 7), ('PosterizeOriginal', 0.6, 6)],
[('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
[('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
[('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
[('PosterizeOriginal', 0.8, 5), ('Equalize', 1.0, 2)],
[('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
[('Equalize', 0.6, 8), ('PosterizeOriginal', 0.4, 6)],
[('Rotate', 0.8, 8), ('Color', 0.4, 0)],
[('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
[('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
[('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
[('Color', 0.6, 4), ('Contrast', 1.0, 8)],
[('Rotate', 0.8, 8), ('Color', 1.0, 2)],
[('Color', 0.8, 8), ('Solarize', 0.8, 7)],
[('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
[('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
[('Color', 0.4, 0), ('Equalize', 0.6, 3)],
[('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
[('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
[('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
[('Color', 0.6, 4), ('Contrast', 1.0, 8)],
[('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
]
pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
return pc
def auto_augment_policy_originalr(hparams):
# ImageNet policy from https://arxiv.org/abs/1805.09501 with research posterize variation
policy = [
[('PosterizeIncreasing', 0.4, 8), ('Rotate', 0.6, 9)],
[('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
[('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
[('PosterizeIncreasing', 0.6, 7), ('PosterizeIncreasing', 0.6, 6)],
[('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
[('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
[('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
[('PosterizeIncreasing', 0.8, 5), ('Equalize', 1.0, 2)],
[('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
[('Equalize', 0.6, 8), ('PosterizeIncreasing', 0.4, 6)],
[('Rotate', 0.8, 8), ('Color', 0.4, 0)],
[('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
[('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
[('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
[('Color', 0.6, 4), ('Contrast', 1.0, 8)],
[('Rotate', 0.8, 8), ('Color', 1.0, 2)],
[('Color', 0.8, 8), ('Solarize', 0.8, 7)],
[('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
[('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
[('Color', 0.4, 0), ('Equalize', 0.6, 3)],
[('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
[('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
[('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
[('Color', 0.6, 4), ('Contrast', 1.0, 8)],
[('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
]
pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
return pc
def auto_augment_policy(name='v0', hparams=None):
hparams = hparams or _HPARAMS_DEFAULT
if name == 'original':
return auto_augment_policy_original(hparams)
elif name == 'originalr':
return auto_augment_policy_originalr(hparams)
elif name == 'v0':
return auto_augment_policy_v0(hparams)
elif name == 'v0r':
return auto_augment_policy_v0r(hparams)
else:
assert False, 'Unknown AA policy (%s)' % name
class AutoAugment(object):
def __init__(self, policy):
self.policy = policy
def __call__(self, img):
sub_policy = random.choice(self.policy)
for op in sub_policy:
img = op(img)
return img
def auto_augment_transform(config_str, hparams):
"""
Create a AutoAugment transform
:param config_str: String defining configuration of auto augmentation. Consists of multiple sections separated by
dashes ('-'). The first section defines the AutoAugment policy (one of 'v0', 'v0r', 'original', 'originalr').
The remaining sections, not order sepecific determine
'mstd' - float std deviation of magnitude noise applied
Ex 'original-mstd0.5' results in AutoAugment with original policy, magnitude_std 0.5
:param hparams: Other hparams (kwargs) for the AutoAugmentation scheme
:return: A callable Transform Op
"""
config = config_str.split('-')
policy_name = config[0]
config = config[1:]
for c in config:
cs = re.split(r'(\d.*)', c)
if len(cs) < 2:
continue
key, val = cs[:2]
if key == 'mstd':
# noise param injected via hparams for now
hparams.setdefault('magnitude_std', float(val))
else:
assert False, 'Unknown AutoAugment config section'
aa_policy = auto_augment_policy(policy_name, hparams=hparams)
return AutoAugment(aa_policy)
_RAND_TRANSFORMS = [
'AutoContrast',
'Equalize',
'Invert',
'Rotate',
'Posterize',
'Solarize',
'SolarizeAdd',
'Color',
'Contrast',
'Brightness',
'Sharpness',
'ShearX',
'ShearY',
'TranslateXRel',
'TranslateYRel',
#'Cutout' # NOTE I've implement this as random erasing separately
]
_RAND_INCREASING_TRANSFORMS = [
'AutoContrast',
'Equalize',
'Invert',
'Rotate',
'PosterizeIncreasing',
'SolarizeIncreasing',
'SolarizeAdd',
'ColorIncreasing',
'ContrastIncreasing',
'BrightnessIncreasing',
'SharpnessIncreasing',
'ShearX',
'ShearY',
'TranslateXRel',
'TranslateYRel',
#'Cutout' # NOTE I've implement this as random erasing separately
]
# These experimental weights are based loosely on the relative improvements mentioned in paper.
# They may not result in increased performance, but could likely be tuned to so.
_RAND_CHOICE_WEIGHTS_0 = {
'Rotate': 0.3,
'ShearX': 0.2,
'ShearY': 0.2,
'TranslateXRel': 0.1,
'TranslateYRel': 0.1,
'Color': .025,
'Sharpness': 0.025,
'AutoContrast': 0.025,
'Solarize': .005,
'SolarizeAdd': .005,
'Contrast': .005,
'Brightness': .005,
'Equalize': .005,
'Posterize': 0,
'Invert': 0,
}
def _select_rand_weights(weight_idx=0, transforms=None):
transforms = transforms or _RAND_TRANSFORMS
assert weight_idx == 0 # only one set of weights currently
rand_weights = _RAND_CHOICE_WEIGHTS_0
probs = [rand_weights[k] for k in transforms]
probs /= np.sum(probs)
return probs
def rand_augment_ops(magnitude=10, hparams=None, transforms=None):
hparams = hparams or _HPARAMS_DEFAULT
transforms = transforms or _RAND_TRANSFORMS
return [
AugmentOp(
name, prob=0.5, magnitude=magnitude, hparams=hparams)
for name in transforms
]
class RandAugment(object):
def __init__(self, ops, num_layers=2, choice_weights=None):
self.ops = ops
self.num_layers = num_layers
self.choice_weights = choice_weights
def __call__(self, img):
# no replacement when using weighted choice
ops = np.random.choice(
self.ops,
self.num_layers,
replace=self.choice_weights is None,
p=self.choice_weights)
for op in ops:
img = op(img)
return img
def rand_augment_transform(config_str, hparams):
"""
Create a RandAugment transform
:param config_str: String defining configuration of random augmentation. Consists of multiple sections separated by
dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand'). The remaining
sections, not order sepecific determine
'm' - integer magnitude of rand augment
'n' - integer num layers (number of transform ops selected per image)
'w' - integer probabiliy weight index (index of a set of weights to influence choice of op)
'mstd' - float std deviation of magnitude noise applied
'inc' - integer (bool), use augmentations that increase in severity with magnitude (default: 0)
Ex 'rand-m9-n3-mstd0.5' results in RandAugment with magnitude 9, num_layers 3, magnitude_std 0.5
'rand-mstd1-w0' results in magnitude_std 1.0, weights 0, default magnitude of 10 and num_layers 2
:param hparams: Other hparams (kwargs) for the RandAugmentation scheme
:return: A callable Transform Op
"""
magnitude = _MAX_LEVEL # default to _MAX_LEVEL for magnitude (currently 10)
num_layers = 2 # default to 2 ops per image
weight_idx = None # default to no probability weights for op choice
transforms = _RAND_TRANSFORMS
config = config_str.split('-')
assert config[0] == 'rand'
config = config[1:]
for c in config:
cs = re.split(r'(\d.*)', c)
if len(cs) < 2:
continue
key, val = cs[:2]
if key == 'mstd':
# noise param injected via hparams for now
hparams.setdefault('magnitude_std', float(val))
elif key == 'inc':
if bool(val):
transforms = _RAND_INCREASING_TRANSFORMS
elif key == 'm':
magnitude = int(val)
elif key == 'n':
num_layers = int(val)
elif key == 'w':
weight_idx = int(val)
else:
assert False, 'Unknown RandAugment config section'
ra_ops = rand_augment_ops(
magnitude=magnitude, hparams=hparams, transforms=transforms)
choice_weights = None if weight_idx is None else _select_rand_weights(
weight_idx)
return RandAugment(ra_ops, num_layers, choice_weights=choice_weights)
_AUGMIX_TRANSFORMS = [
'AutoContrast',
'ColorIncreasing', # not in paper
'ContrastIncreasing', # not in paper
'BrightnessIncreasing', # not in paper
'SharpnessIncreasing', # not in paper
'Equalize',
'Rotate',
'PosterizeIncreasing',
'SolarizeIncreasing',
'ShearX',
'ShearY',
'TranslateXRel',
'TranslateYRel',
]
def augmix_ops(magnitude=10, hparams=None, transforms=None):
hparams = hparams or _HPARAMS_DEFAULT
transforms = transforms or _AUGMIX_TRANSFORMS
return [
AugmentOp(
name, prob=1.0, magnitude=magnitude, hparams=hparams)
for name in transforms
]
class AugMixAugment(object):
""" AugMix Transform
Adapted and improved from impl here: https://github.com/google-research/augmix/blob/master/imagenet.py
From paper: 'AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty -
https://arxiv.org/abs/1912.02781
"""
def __init__(self, ops, alpha=1., width=3, depth=-1, blended=False):
self.ops = ops
self.alpha = alpha
self.width = width
self.depth = depth
self.blended = blended # blended mode is faster but not well tested
def _calc_blended_weights(self, ws, m):
ws = ws * m
cump = 1.
rws = []
for w in ws[::-1]:
alpha = w / cump
cump *= (1 - alpha)
rws.append(alpha)
return np.array(rws[::-1], dtype=np.float32)
def _apply_blended(self, img, mixing_weights, m):
# This is my first crack and implementing a slightly faster mixed augmentation. Instead
# of accumulating the mix for each chain in a Numpy array and then blending with original,
# it recomputes the blending coefficients and applies one PIL image blend per chain.
# TODO the results appear in the right ballpark but they differ by more than rounding.
img_orig = img.copy()
ws = self._calc_blended_weights(mixing_weights, m)
for w in ws:
depth = self.depth if self.depth > 0 else np.random.randint(1, 4)
ops = np.random.choice(self.ops, depth, replace=True)
img_aug = img_orig # no ops are in-place, deep copy not necessary
for op in ops:
img_aug = op(img_aug)
img = Image.blend(img, img_aug, w)
return img
def _apply_basic(self, img, mixing_weights, m):
# This is a literal adaptation of the paper/official implementation without normalizations and
# PIL <-> Numpy conversions between every op. It is still quite CPU compute heavy compared to the
# typical augmentation transforms, could use a GPU / Kornia implementation.
img_shape = img.size[0], img.size[1], len(img.getbands())
mixed = np.zeros(img_shape, dtype=np.float32)
for mw in mixing_weights:
depth = self.depth if self.depth > 0 else np.random.randint(1, 4)
ops = np.random.choice(self.ops, depth, replace=True)
img_aug = img # no ops are in-place, deep copy not necessary
for op in ops:
img_aug = op(img_aug)
mixed += mw * np.asarray(img_aug, dtype=np.float32)
np.clip(mixed, 0, 255., out=mixed)
mixed = Image.fromarray(mixed.astype(np.uint8))
return Image.blend(img, mixed, m)
def __call__(self, img):
mixing_weights = np.float32(
np.random.dirichlet([self.alpha] * self.width))
m = np.float32(np.random.beta(self.alpha, self.alpha))
if self.blended:
mixed = self._apply_blended(img, mixing_weights, m)
else:
mixed = self._apply_basic(img, mixing_weights, m)
return mixed
def augment_and_mix_transform(config_str, hparams):
""" Create AugMix transform
:param config_str: String defining configuration of random augmentation. Consists of multiple sections separated by
dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand'). The remaining
sections, not order sepecific determine
'm' - integer magnitude (severity) of augmentation mix (default: 3)
'w' - integer width of augmentation chain (default: 3)
'd' - integer depth of augmentation chain (-1 is random [1, 3], default: -1)
'b' - integer (bool), blend each branch of chain into end result without a final blend, less CPU (default: 0)
'mstd' - float std deviation of magnitude noise applied (default: 0)
Ex 'augmix-m5-w4-d2' results in AugMix with severity 5, chain width 4, chain depth 2
:param hparams: Other hparams (kwargs) for the Augmentation transforms
:return: A callable Transform Op
"""
magnitude = 3
width = 3
depth = -1
alpha = 1.
blended = False
config = config_str.split('-')
assert config[0] == 'augmix'
config = config[1:]
for c in config:
cs = re.split(r'(\d.*)', c)
if len(cs) < 2:
continue
key, val = cs[:2]
if key == 'mstd':
# noise param injected via hparams for now
hparams.setdefault('magnitude_std', float(val))
elif key == 'm':
magnitude = int(val)
elif key == 'w':
width = int(val)
elif key == 'd':
depth = int(val)
elif key == 'a':
alpha = float(val)
elif key == 'b':
blended = bool(val)
else:
assert False, 'Unknown AugMix config section'
ops = augmix_ops(magnitude=magnitude, hparams=hparams)
return AugMixAugment(
ops, alpha=alpha, width=width, depth=depth, blended=blended)
class RawTimmAutoAugment(object):
"""TimmAutoAugment API for PaddleClas."""
def __init__(self,
config_str="rand-m9-mstd0.5-inc1",
interpolation="bicubic",
img_size=224,
mean=IMAGENET_DEFAULT_MEAN):
if isinstance(img_size, (tuple, list)):
img_size_min = min(img_size)
else:
img_size_min = img_size
aa_params = dict(
translate_const=int(img_size_min * 0.45),
img_mean=tuple([min(255, round(255 * x)) for x in mean]), )
if interpolation and interpolation != 'random':
aa_params['interpolation'] = _pil_interp(interpolation)
if config_str.startswith('rand'):
self.augment_func = rand_augment_transform(config_str, aa_params)
elif config_str.startswith('augmix'):
aa_params['translate_pct'] = 0.3
self.augment_func = augment_and_mix_transform(config_str,
aa_params)
elif config_str.startswith('auto'):
self.augment_func = auto_augment_transform(config_str, aa_params)
else:
raise Exception(
"ConfigError: The TimmAutoAugment Op only support RandAugment, AutoAugment, AugMix, and the config_str only starts with \"rand\", \"augmix\", \"auto\"."
)
def __call__(self, img):
return self.augment_func(img)

View File

@ -200,7 +200,7 @@ class Engine(object):
if self.mode == 'train':
self.optimizer, self.lr_sch = build_optimizer(
self.config["Optimizer"], self.config["Global"]["epochs"],
len(self.train_dataloader), self.model.parameters())
len(self.train_dataloader), [self.model])
# for distributed
self.config["Global"][
@ -355,7 +355,8 @@ class Engine(object):
def export(self):
assert self.mode == "export"
model = ExportModel(self.config["Arch"], self.model)
use_multilabel = self.config["Global"].get("use_multilabel", False)
model = ExportModel(self.config["Arch"], self.model, use_multilabel)
if self.config["Global"]["pretrained_model"] is not None:
load_dygraph_pretrain(model.base_model,
self.config["Global"]["pretrained_model"])
@ -388,10 +389,9 @@ class ExportModel(nn.Layer):
ExportModel: add softmax onto the model
"""
def __init__(self, config, model):
def __init__(self, config, model, use_multilabel):
super().__init__()
self.base_model = model
# we should choose a final model to export
if isinstance(self.base_model, DistillationModel):
self.infer_model_name = config["infer_model_name"]
@ -402,10 +402,13 @@ class ExportModel(nn.Layer):
if self.infer_output_key == "features" and isinstance(self.base_model,
RecModel):
self.base_model.head = IdentityHead()
if config.get("infer_add_softmax", True):
self.softmax = nn.Softmax(axis=-1)
if use_multilabel:
self.out_act = nn.Sigmoid()
else:
self.softmax = None
if config.get("infer_add_softmax", True):
self.out_act = nn.Softmax(axis=-1)
else:
self.out_act = None
def eval(self):
self.training = False
@ -421,6 +424,6 @@ class ExportModel(nn.Layer):
x = x[self.infer_model_name]
if self.infer_output_key is not None:
x = x[self.infer_output_key]
if self.softmax is not None:
x = self.softmax(x)
if self.out_act is not None:
x = self.out_act(x)
return x

Some files were not shown because too many files have changed in this diff Show More