Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleClas into benchmark

2021-10-13 09:26:28 +00:00 · 2021-10-13 09:26:28 +00:00 · 272bc9481d
parent 6c6fe7deab 9ce7037f3d
commit 272bc9481d
152 changed files with 8357 additions and 933 deletions
--- a/.gitignore
+++ b/.gitignore
@ -3,10 +3,11 @@ __pycache__/
 *.sw*
 */workerlog*
 checkpoints/
-output/
+output*/
 pretrained/
 .ipynb_checkpoints/
 *.ipynb*
 _build/
 build/
+log/
 nohup.out
--- a/README_ch.md
+++ b/README_ch.md
@ -7,7 +7,7 @@
 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集，助力使用者训练出更好的视觉模型和应用落地。

 **近期更新**
-
+- 2021.09.17 增加PaddleClas自研PP-LCNet系列模型, 这些模型在Intel CPU上有较强的竞争力。相关指标和预训练权重可以从 [这里](docs/zh_CN/ImageNet_models.md)下载。
 - 2021.08.11 更新7个[FAQ](docs/zh_CN/faq_series/faq_2021_s2.md)。
 - 2021.06.29 添加Swin-transformer系列模型，ImageNet1k数据集上Top1 acc最高精度可达87.2%；支持训练预测评估与whl包部署，预训练模型可以从[这里](docs/zh_CN/models/models_intro.md)下载。
 - 2021.06.22,23,24 PaddleClas官方研发团队带来技术深入解读三日直播课。课程回放：[https://aistudio.baidu.com/aistudio/course/introduce/24519](https://aistudio.baidu.com/aistudio/course/introduce/24519)
--- a/README_en.md
+++ b/README_en.md
@ -8,6 +8,8 @@ PaddleClas is an image recognition toolset for industry and academia, helping us

 **Recent updates**

+- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs. The metrics and pretrained model are available [here](docs/en/ImageNet_models_en.md).
+
 - 2021.06.29 Add Swin-transformer series model，Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
 - 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.
 - [more](./docs/en/update_history_en.md)
--- a/deploy/configs/inference_multilabel_cls.yaml
+++ b/deploy/configs/inference_multilabel_cls.yaml
@ -0,0 +1,33 @@
+Global:
+  infer_imgs: "./images/0517_2715693311.jpg"
+  inference_model_dir: "../inference/"
+  batch_size: 1
+  use_gpu: True
+  enable_mkldnn: False
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+PreProcess:
+  transform_ops:
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+        channel_num: 3
+    - ToCHWImage:
+PostProcess:
+  main_indicator: MultiLabelTopk
+  MultiLabelTopk:
+    topk: 5
+    class_id_map_file: None
+  SavePreLabel:
+    save_dir: ./pre_label/
--- a/deploy/images/0517_2715693311.jpg
+++ b/deploy/images/0517_2715693311.jpg
--- a/deploy/paddleserving/README.md
+++ b/deploy/paddleserving/README.md
@ -4,9 +4,9 @@

 PaddleClas provides two service deployment methods:
 - Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../../deploy/hubserving/readme_en.md)
- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`". Please follow this tutorial.
+- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`".  if you prefer retrieval_based image reocognition service, please refer to [tutorial](./recognition/README.md)，if you'd like image classification service, Please follow this tutorial.

-# Service deployment based on PaddleServing  
+# Image Classification Service deployment based on PaddleServing  

 This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the ResNet50_vd model as a pipeline online service.

@ -131,7 +131,7 @@ fetch_var {
    config.yml                # configuration file of starting the service
    pipeline_http_client.py   # script to send pipeline prediction request by http
    pipeline_rpc_client.py    # script to send pipeline prediction request by rpc
-    resnet50_web_service.py   # start the script of the pipeline server
+    classification_web_service.py   # start the script of the pipeline server
    ```

 2. Run the following command to start the service.
@ -147,7 +147,7 @@ fetch_var {
    python3 pipeline_http_client.py
    ```
    After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
-    ![](./imgs/results.png)  
+    ![](./imgs/results.png)

    Adjust the number of concurrency in config.yml to get the largest QPS. 

--- a/deploy/paddleserving/README_CN.md
+++ b/deploy/paddleserving/README_CN.md
@ -4,9 +4,9 @@

 PaddleClas提供2种服务部署方式：
 - 基于PaddleHub Serving的部署：代码路径为"`./deploy/hubserving`"，使用方法参考[文档](../../deploy/hubserving/readme.md)；
- 基于PaddleServing的部署：代码路径为"`./deploy/paddleserving`"，按照本教程使用。
+- 基于PaddleServing的部署：代码路径为"`./deploy/paddleserving`"， 基于检索方式的图像识别服务参考[文档](./recognition/README_CN.md)， 图像分类服务按照本教程使用。

-# 基于PaddleServing的服务部署
+# 基于PaddleServing的图像分类服务部署

 本文档以经典的ResNet50_vd模型为例，介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas
 动态图模型的pipeline在线服务。
@ -127,7 +127,7 @@ fetch_var {
    config.yml                 # 启动服务的配置文件
    pipeline_http_client.py    # http方式发送pipeline预测请求的脚本
    pipeline_rpc_client.py     # rpc方式发送pipeline预测请求的脚本
-    resnet50_web_service.py    # 启动pipeline服务端的脚本
+    classification_web_service.py    # 启动pipeline服务端的脚本
    ```

 2. 启动服务可运行如下命令：
--- a/deploy/paddleserving/imgs/results_recog.png
+++ b/deploy/paddleserving/imgs/results_recog.png
--- a/deploy/paddleserving/imgs/start_server_recog.png
+++ b/deploy/paddleserving/imgs/start_server_recog.png
--- a/deploy/paddleserving/recognition/README.md
+++ b/deploy/paddleserving/recognition/README.md
@ -0,0 +1,178 @@
+# Product Recognition Service deployment based on PaddleServing  
+
+(English|[简体中文](./README_CN.md))
+
+This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the product recognition model based on retrieval method as a pipeline online service.
+
+Some Key Features of Paddle Serving:
+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed with one line command.
+- Industrial serving features supported, such as models management, online loading, online A/B testing etc.
+- Highly concurrent and efficient communication between clients and servers supported.
+
+The introduction and tutorial of Paddle Serving service deployment framework reference [document](https://github.com/PaddlePaddle/Serving/blob/develop/README.md).
+
+## Contents
+- [Environmental preparation](#environmental-preparation)
+- [Model conversion](#model-conversion)
+- [Paddle Serving pipeline deployment](#paddle-serving-pipeline-deployment)
+- [FAQ](#faq)
+
+<a name="environmental-preparation"></a>
+## Environmental preparation
+
+PaddleClas operating environment and PaddleServing operating environment are needed.
+
+1. Please prepare PaddleClas operating environment reference [link](../../docs/zh_CN/tutorials/install.md).
+   Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.1.0.
+
+2. The steps of PaddleServing operating environment prepare are as follows:
+
+    Install serving which used to start the service
+    ```
+    pip3 install paddle-serving-server==0.6.1 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
+    # Other GPU environments need to confirm the environment and then choose to execute the following commands
+    pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
+    ```
+
+3. Install the client to send requests to the service
+    In [download link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md) find the client installation package corresponding to the python version.
+    The python3.7 version is recommended here:
+
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
+    ```
+
+4. Install serving-app
+    ```
+    pip3 install paddle-serving-app==0.6.1
+    ```
+
+   **note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md).
+
+
+<a name="model-conversion"></a>
+## Model conversion
+When using PaddleServing for service deployment, you need to convert the saved inference model into a serving model that is easy to deploy.
+The following assumes that the current working directory is the PaddleClas root directory
+
+Firstly, download the inference model of ResNet50_vd
+```
+cd deploy
+# Download and unzip the ResNet50_vd model
+wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
+cd models
+tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
+```
+
+Then, you can use installed paddle_serving_client tool to convert inference model to mobile model.
+```
+#  Product recognition model conversion
+python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
+                                         --model_filename inference.pdmodel  \
+                                         --params_filename inference.pdiparams \
+                                         --serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
+                                         --serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
+```
+
+After the ResNet50_vd inference model is converted, there will be additional folders of `product_ResNet50_vd_aliproduct_v1.0_serving` and `product_ResNet50_vd_aliproduct_v1.0_client` in the current folder, with the following format:
+```
+|- product_ResNet50_vd_aliproduct_v1.0_serving/
+  |- __model__  
+  |- __params__
+  |- serving_server_conf.prototxt  
+  |- serving_server_conf.stream.prototxt
+
+|- product_ResNet50_vd_aliproduct_v1.0_client
+  |- serving_client_conf.prototxt  
+  |- serving_client_conf.stream.prototxt
+```
+
+Once you have the model file for deployment, you need to change the alias name in `serving_server_conf.prototxt`:  change `alias_name` in `fetch_var` to `features`,
+The modified serving_server_conf.prototxt file is as follows:
+```
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+
+Next，download and unpack the built index of product gallery
+```
+cd ../
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
+```
+
+
+<a name="paddle-serving-pipeline-deployment"></a>
+## Paddle Serving pipeline deployment
+
+**Attention:** pipeline deployment mode does not support Windows platform
+
+1. Download the PaddleClas code, if you have already downloaded it, you can skip this step.
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleClas
+
+    # Enter the working directory  
+    cd PaddleClas/deploy/paddleserving/recognition
+    ```
+
+    The paddleserving directory contains the code to start the pipeline service and send prediction requests, including:
+    ```
+    __init__.py
+    config.yml                # configuration file of starting the service
+    pipeline_http_client.py   # script to send pipeline prediction request by http
+    pipeline_rpc_client.py    # script to send pipeline prediction request by rpc
+    recognition_web_service.py   # start the script of the pipeline server
+    ```
+
+2. Run the following command to start the service.
+    ```
+    # Start the service and save the running log in log.txt
+    python3 recognition_web_service.py &>log.txt &
+    ```
+    After the service is successfully started, a log similar to the following will be printed in log.txt
+    ![](../imgs/start_server_recog.png)
+
+3. Send service request
+    ```
+    python3 pipeline_http_client.py
+    ```
+    After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
+    ![](../imgs/results_recog.png)  
+
+    Adjust the number of concurrency in config.yml to get the largest QPS. 
+
+    ```
+    op:
+        concurrency: 8
+        ...
+    ```
+
+    Multiple service requests can be sent at the same time if necessary.
+
+    The predicted performance data will be automatically written into the `PipelineServingLogs/pipeline.tracer` file.
+
+<a name="faq"></a>
+## FAQ
+**Q1**: No result return after sending the request.
+
+**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and before sending the request. The command to close the proxy is:
+```
+unset https_proxy
+unset http_proxy
+```  
--- a/deploy/paddleserving/recognition/README_CN.md
+++ b/deploy/paddleserving/recognition/README_CN.md
@ -0,0 +1,174 @@
+# 基于PaddleServing的商品识别服务部署
+
+([English](./README.md)|简体中文)
+
+本文以商品识别为例，介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas动态图模型的pipeline在线服务。
+
+相比较于hubserving部署，PaddleServing具备以下优点：
+- 支持客户端和服务端之间高并发和高效通信
+- 支持 工业级的服务能力 例如模型管理，在线加载，在线A/B测试等
+- 支持 多种编程语言 开发客户端，例如C++, Python和Java
+
+更多有关PaddleServing服务化部署框架介绍和使用教程参考[文档](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)。
+
+## 目录
+- [环境准备](#环境准备)
+- [模型转换](#模型转换)
+- [Paddle Serving pipeline部署](#部署)
+- [FAQ](#FAQ)
+
+<a name="环境准备"></a>
+## 环境准备
+
+需要准备PaddleClas的运行环境和PaddleServing的运行环境。
+
+- 准备PaddleClas的[运行环境](../../docs/zh_CN/tutorials/install.md), 根据环境下载对应的paddle whl包，推荐安装2.1.0版本
+
+- 准备PaddleServing的运行环境，步骤如下
+
+1. 安装serving，用于启动服务
+    ```
+    pip3 install paddle-serving-server==0.6.1 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
+    # 其他GPU环境需要确认环境再选择执行如下命令
+    pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
+    ```
+
+2. 安装client，用于向服务发送请求
+    在[下载链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)中找到对应python版本的client安装包，这里推荐python3.7版本：
+
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
+    ```
+
+3. 安装serving-app
+    ```
+    pip3 install paddle-serving-app==0.6.1
+    ```
+    **Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)。
+
+<a name="模型转换"></a>
+## 模型转换
+
+使用PaddleServing做服务化部署时，需要将保存的inference模型转换为serving易于部署的模型。 
+以下内容假定当前工作目录为PaddleClas根目录。
+
+首先，下载商品识别的inference模型
+```
+cd deploy
+
+# 下载并解压商品识别模型
+wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
+cd models
+tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
+```
+
+接下来，用安装的paddle_serving_client把下载的inference模型转换成易于server部署的模型格式。
+
+```
+# 转换商品识别模型
+python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
+                                         --model_filename inference.pdmodel  \
+                                         --params_filename inference.pdiparams \
+                                         --serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
+                                         --serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
+```
+商品识别推理模型转换完成后，会在当前文件夹多出`product_ResNet50_vd_aliproduct_v1.0_serving` 和`product_ResNet50_vd_aliproduct_v1.0_client`的文件夹，具备如下格式：
+```
+|- product_ResNet50_vd_aliproduct_v1.0_serving/
+  |- __model__  
+  |- __params__
+  |- serving_server_conf.prototxt  
+  |- serving_server_conf.stream.prototxt
+
+|- product_ResNet50_vd_aliproduct_v1.0_client
+  |- serving_client_conf.prototxt  
+  |- serving_client_conf.stream.prototxt
+
+```
+得到模型文件之后，需要修改serving_server_conf.prototxt中的alias名字： 将`fetch_var`中的`alias_name`改为`features`, 
+修改后的serving_server_conf.prototxt内容如下：
+```
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+
+接下来，下载并解压已经构建后的商品库index
+```
+cd ../
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
+```
+
+
+<a name="部署"></a>
+## Paddle Serving pipeline部署
+
+**注意:**  pipeline部署方式不支持windows平台
+
+1. 下载PaddleClas代码，若已下载可跳过此步骤
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleClas
+
+    # 进入到工作目录
+    cd PaddleClas/deploy/paddleserving/recognition
+    ```
+    paddleserving目录包含启动pipeline服务和发送预测请求的代码，包括：
+    ```
+    __init__.py
+    config.yml                    # 启动服务的配置文件
+    pipeline_http_client.py       # http方式发送pipeline预测请求的脚本
+    pipeline_rpc_client.py        # rpc方式发送pipeline预测请求的脚本
+    recognition_web_service.py    # 启动pipeline服务端的脚本
+    ```
+
+2. 启动服务可运行如下命令：
+    ```
+    # 启动服务，运行日志保存在log.txt
+    python3 recognition_web_service.py &>log.txt &
+    ```
+    成功启动服务后，log.txt中会打印类似如下日志
+    ![](../imgs/start_server_recog.png)
+
+3. 发送服务请求：
+    ```
+    python3 pipeline_http_client.py
+    ```
+    成功运行后，模型预测的结果会打印在cmd窗口中，结果示例为：
+    ![](../imgs/results_recog.png)
+
+    调整 config.yml 中的并发个数可以获得最大的QPS
+    ```
+    op:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 8
+        ...
+    ```
+    有需要的话可以同时发送多个服务请求
+
+    预测性能数据会被自动写入 `PipelineServingLogs/pipeline.tracer` 文件中。
+
+<a name="FAQ"></a>
+## FAQ
+**Q1**： 发送请求后没有结果返回或者提示输出解码报错
+
+**A1**： 启动服务和发送请求时不要设置代理，可以在启动服务前和发送请求前关闭代理，关闭代理的命令是：
+```
+unset https_proxy
+unset http_proxy
+```
--- a/deploy/paddleserving/recognition/init.py
+++ b/deploy/paddleserving/recognition/init.py
--- a/deploy/paddleserving/recognition/config.yml
+++ b/deploy/paddleserving/recognition/config.yml
@ -0,0 +1,43 @@
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 1
+
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 18081
+rpc_port: 9994
+
+dag:
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: False
+op:
+    rec:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
+        local_service_conf:
+
+            #uci模型路径
+            model_config: ../../models/product_ResNet50_vd_aliproduct_v1.0_serving
+
+            #计算硬件类型: 空缺时由devices决定(CPU/GPU)，0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
+            device_type: 1
+
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "0" # "0,1"
+
+            #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
+            client_type: local_predictor
+
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["features"]
+            
+    det:
+        concurrency: 1
+        local_service_conf:
+            client_type: local_predictor
+            device_type: 1
+            devices: '0'
+            fetch_list:
+            - save_infer_model/scale_0.tmp_1
+            model_config: ../../models/ppyolov2_r50vd_dcn_mainbody_v1.0_serving/
--- a/deploy/paddleserving/recognition/daoxiangcunjinzhubing_6.jpg
+++ b/deploy/paddleserving/recognition/daoxiangcunjinzhubing_6.jpg
--- a/deploy/paddleserving/recognition/label_list.txt
+++ b/deploy/paddleserving/recognition/label_list.txt
@ -0,0 +1,2 @@
+foreground
+background
--- a/deploy/paddleserving/recognition/pipeline_http_client.py
+++ b/deploy/paddleserving/recognition/pipeline_http_client.py
@ -0,0 +1,21 @@
+import requests
+import json
+import base64
+import os
+
+imgpath = "daoxiangcunjinzhubing_6.jpg"
+
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+
+if __name__ == "__main__":
+    url = "http://127.0.0.1:18081/recognition/prediction"
+
+    with open(os.path.join(".",  imgpath), 'rb') as file:
+        image_data1 = file.read()
+    image = cv2_to_base64(image_data1)
+    data = {"key": ["image"], "value": [image]}
+
+    for i in range(1):
+        r = requests.post(url=url, data=json.dumps(data))
+        print(r.json())
--- a/deploy/paddleserving/recognition/pipeline_rpc_client.py
+++ b/deploy/paddleserving/recognition/pipeline_rpc_client.py
@ -0,0 +1,34 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+try:
+    from paddle_serving_server_gpu.pipeline import PipelineClient
+except ImportError:
+    from paddle_serving_server.pipeline import PipelineClient
+import base64
+
+client = PipelineClient()
+client.connect(['127.0.0.1:9994'])
+imgpath = "daoxiangcunjinzhubing_6.jpg"
+
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+
+if __name__ == "__main__":
+    with open(imgpath, 'rb') as file:
+        image_data = file.read()
+    image = cv2_to_base64(image_data)
+
+    for i in range(1):
+        ret = client.predict(feed_dict={"image": image}, fetch=["result"])
+        print(ret)
--- a/deploy/paddleserving/recognition/recognition_web_service.py
+++ b/deploy/paddleserving/recognition/recognition_web_service.py
@ -0,0 +1,198 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_server.web_service import WebService, Op
+import logging
+import numpy as np
+import sys
+import cv2
+from paddle_serving_app.reader import *
+import base64
+import os
+import faiss
+import pickle
+import json
+
+class DetOp(Op):
+    def init_op(self):
+        self.img_preprocess = Sequential([
+            BGR2RGB(), Div(255.0),
+            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
+            Resize((640, 640)), Transpose((2, 0, 1))
+        ])
+
+        self.img_postprocess = RCNNPostprocess("label_list.txt", "output")
+        self.threshold = 0.2
+        self.max_det_results = 5
+
+    def generate_scale(self, im):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+        target_size = [640, 640]
+        origin_shape = im.shape[:2]
+        resize_h, resize_w = target_size
+        im_scale_y = resize_h / float(origin_shape[0])
+        im_scale_x = resize_w / float(origin_shape[1])
+        return im_scale_y, im_scale_x
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        imgs = []
+        raw_imgs = []
+        for key in input_dict.keys():
+            data = base64.b64decode(input_dict[key].encode('utf8'))
+            raw_imgs.append(data)
+            data = np.fromstring(data, np.uint8)
+            raw_im = cv2.imdecode(data, cv2.IMREAD_COLOR)
+
+            im_scale_y, im_scale_x = self.generate_scale(raw_im)
+            im = self.img_preprocess(raw_im)
+            
+            imgs.append({
+              "image": im[np.newaxis, :],
+              "im_shape": np.array(list(im.shape[1:])).reshape(-1)[np.newaxis,:],
+              "scale_factor": np.array([im_scale_y, im_scale_x]).astype('float32'),
+            })
+        self.raw_img = raw_imgs
+
+        feed_dict = {
+            "image":        np.concatenate([x["image"] for x in imgs], axis=0),
+            "im_shape":     np.concatenate([x["im_shape"] for x in imgs], axis=0),
+            "scale_factor": np.concatenate([x["scale_factor"] for x in imgs], axis=0)
+        }
+        return feed_dict, False,  None,  ""
+
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        boxes = self.img_postprocess(fetch_dict, visualize=False)
+        boxes.sort(key = lambda x: x["score"], reverse = True)
+        boxes = filter(lambda x: x["score"] >= self.threshold, boxes[:self.max_det_results])
+        boxes = list(boxes)
+        for i in range(len(boxes)):
+            boxes[i]["bbox"][2] += boxes[i]["bbox"][0] - 1
+            boxes[i]["bbox"][3] += boxes[i]["bbox"][1] - 1
+        result = json.dumps(boxes)
+        res_dict = {"bbox_result": result, "image": self.raw_img}
+        return res_dict,  None,  ""
+
+class RecOp(Op):
+    def init_op(self):
+        self.seq = Sequential([
+            BGR2RGB(), Resize((224, 224)), 
+            Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
+                                False), Transpose((2, 0, 1))
+        ])
+
+        index_dir = "../../recognition_demo_data_v1.1/gallery_product/index"
+        assert os.path.exists(os.path.join(
+            index_dir, "vector.index")), "vector.index not found ..."
+        assert os.path.exists(os.path.join(
+            index_dir, "id_map.pkl")), "id_map.pkl not found ... "
+        
+        self.searcher = faiss.read_index(
+            os.path.join(index_dir, "vector.index"))
+                
+        with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
+            self.id_map = pickle.load(fd)
+
+        self.rec_nms_thresold = 0.05
+        self.rec_score_thres = 0.5
+        self.feature_normalize = True
+        self.return_k = 1
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        raw_img = input_dict["image"][0]
+        data = np.frombuffer(raw_img, np.uint8)
+        origin_img = cv2.imdecode(data, cv2.IMREAD_COLOR)
+        dt_boxes = input_dict["bbox_result"]
+        boxes = json.loads(dt_boxes)
+        boxes.append({"category_id": 0,
+                      "score": 1.0,
+                      "bbox": [0, 0, origin_img.shape[1], origin_img.shape[0]]
+                     })
+        self.det_boxes = boxes
+
+        #construct batch images for rec
+        imgs = []
+        for box in boxes:
+            box = [int(x) for x in box["bbox"]]
+            im = origin_img[box[1]: box[3], box[0]: box[2]].copy()
+            img = self.seq(im)
+            imgs.append(img[np.newaxis, :].copy())
+
+        input_imgs = np.concatenate(imgs, axis=0)
+        return {"x": input_imgs},  False,  None,  ""
+
+    def nms_to_rec_results(self, results, thresh = 0.1):
+        filtered_results = []
+        x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
+        y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
+        x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
+        y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
+        scores = np.array([r["rec_scores"] for r in results])
+
+        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+        order = scores.argsort()[::-1]
+        while order.size > 0:
+            i = order[0]
+            xx1 = np.maximum(x1[i], x1[order[1:]])
+            yy1 = np.maximum(y1[i], y1[order[1:]])
+            xx2 = np.minimum(x2[i], x2[order[1:]])
+            yy2 = np.minimum(y2[i], y2[order[1:]])
+
+            w = np.maximum(0.0, xx2 - xx1 + 1)
+            h = np.maximum(0.0, yy2 - yy1 + 1)
+            inter = w * h
+            ovr = inter / (areas[i] + areas[order[1:]] - inter)
+            inds = np.where(ovr <= thresh)[0]
+            order = order[inds + 1]
+            filtered_results.append(results[i])
+        return filtered_results
+
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        batch_features = fetch_dict["features"]
+
+        if self.feature_normalize:
+            feas_norm = np.sqrt(
+                np.sum(np.square(batch_features), axis=1, keepdims=True))
+            batch_features = np.divide(batch_features, feas_norm)
+
+        scores, docs = self.searcher.search(batch_features,  self.return_k)
+
+        results = []
+        for i in range(scores.shape[0]):
+            pred = {}
+            if scores[i][0] >= self.rec_score_thres:
+                pred["bbox"] = [int(x) for x in self.det_boxes[i]["bbox"]]
+                pred["rec_docs"] = self.id_map[docs[i][0]].split()[1]
+                pred["rec_scores"] = scores[i][0]
+                results.append(pred)
+        
+        #do nms
+        results = self.nms_to_rec_results(results, self.rec_nms_thresold)
+        return {"result": str(results)}, None, ""
+
+class RecognitionService(WebService):
+    def get_pipeline_response(self, read_op):
+        det_op = DetOp(name="det", input_ops=[read_op])
+        rec_op = RecOp(name="rec", input_ops=[det_op])
+        return rec_op
+
+product_recog_service = RecognitionService(name="recognition")
+product_recog_service.prepare_pipeline_config("config.yml")
+product_recog_service.run_service()
--- a/deploy/python/postprocess.py
+++ b/deploy/python/postprocess.py
@ -81,12 +81,14 @@ class Topk(object):
            class_id_map = None
        return class_id_map

-    def __call__(self, x, file_names=None):
+    def __call__(self, x, file_names=None, multilabel=False):
        if file_names is not None:
            assert x.shape[0] == len(file_names)
        y = []
        for idx, probs in enumerate(x):
-            index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32")
+            index = probs.argsort(axis=0)[-self.topk:][::-1].astype(
+                "int32") if not multilabel else np.where(
+                    probs >= 0.5)[0].astype("int32")
            clas_id_list = []
            score_list = []
            label_name_list = []
@ -108,6 +110,14 @@ class Topk(object):
        return y


+class MultiLabelTopk(Topk):
+    def __init__(self, topk=1, class_id_map_file=None):
+        super().__init__()
+
+    def __call__(self, x, file_names=None):
+        return super().__call__(x, file_names, multilabel=True)
+
+
 class SavePreLabel(object):
    def __init__(self, save_dir):
        if save_dir is None:
@ -128,23 +138,24 @@ class SavePreLabel(object):
        os.makedirs(output_dir, exist_ok=True)
        shutil.copy(image_file, output_dir)

+
 class Binarize(object):
-    def __init__(self, method = "round"):
+    def __init__(self, method="round"):
        self.method = method
        self.unit = np.array([[128, 64, 32, 16, 8, 4, 2, 1]]).T

    def __call__(self, x, file_names=None):
        if self.method == "round":
            x = np.round(x + 1).astype("uint8") - 1
-        
+
        if self.method == "sign":
            x = ((np.sign(x) + 1) / 2).astype("uint8")

        embedding_size = x.shape[1]
        assert embedding_size % 8 == 0, "The Binary index only support vectors with sizes multiple of 8"
-        
+
        byte = np.zeros([x.shape[0], embedding_size // 8], dtype=np.uint8)
        for i in range(embedding_size // 8):
-            byte[:, i:i+1] = np.dot(x[:, i * 8: (i + 1)* 8], self.unit)
+            byte[:, i:i + 1] = np.dot(x[:, i * 8:(i + 1) * 8], self.unit)

        return byte
--- a/deploy/python/predict_cls.py
+++ b/deploy/python/predict_cls.py
@ -71,7 +71,6 @@ class ClsPredictor(Predictor):
        output_names = self.paddle_predictor.get_output_names()
        output_tensor = self.paddle_predictor.get_output_handle(output_names[
            0])
-
        if self.benchmark:
            self.auto_logger.times.start()
        if not isinstance(images, (list, )):
@ -119,7 +118,6 @@ def main(config):
                                                         ) == len(image_list):
            if len(batch_imgs) == 0:
                continue
-
            batch_results = cls_predictor.predict(batch_imgs)
            for number, result_dict in enumerate(batch_results):
                filename = batch_names[number]
--- a/deploy/python/preprocess.py
+++ b/deploy/python/preprocess.py
@ -19,12 +19,14 @@ from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals

+from functools import partial
 import six
 import math
 import random
 import cv2
 import numpy as np
 import importlib
+from PIL import Image

 from python.det_preprocess import DetNormalizeImage, DetPadStride, DetPermute, DetResize

@ -50,6 +52,50 @@ def create_operators(params):
    return ops


+class UnifiedResize(object):
+    def __init__(self, interpolation=None, backend="cv2"):
+        _cv2_interp_from_str = {
+            'nearest': cv2.INTER_NEAREST,
+            'bilinear': cv2.INTER_LINEAR,
+            'area': cv2.INTER_AREA,
+            'bicubic': cv2.INTER_CUBIC,
+            'lanczos': cv2.INTER_LANCZOS4
+        }
+        _pil_interp_from_str = {
+            'nearest': Image.NEAREST,
+            'bilinear': Image.BILINEAR,
+            'bicubic': Image.BICUBIC,
+            'box': Image.BOX,
+            'lanczos': Image.LANCZOS,
+            'hamming': Image.HAMMING
+        }
+
+        def _pil_resize(src, size, resample):
+            pil_img = Image.fromarray(src)
+            pil_img = pil_img.resize(size, resample)
+            return np.asarray(pil_img)
+
+        if backend.lower() == "cv2":
+            if isinstance(interpolation, str):
+                interpolation = _cv2_interp_from_str[interpolation.lower()]
+            # compatible with opencv < version 4.4.0
+            elif interpolation is None:
+                interpolation = cv2.INTER_LINEAR
+            self.resize_func = partial(cv2.resize, interpolation=interpolation)
+        elif backend.lower() == "pil":
+            if isinstance(interpolation, str):
+                interpolation = _pil_interp_from_str[interpolation.lower()]
+            self.resize_func = partial(_pil_resize, resample=interpolation)
+        else:
+            logger.warning(
+                f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
+            )
+            self.resize_func = cv2.resize
+
+    def __call__(self, src, size):
+        return self.resize_func(src, size)
+
+
 class OperatorParamError(ValueError):
    """ OperatorParamError
    """
@ -87,8 +133,11 @@ class DecodeImage(object):
 class ResizeImage(object):
    """ resize image """

-    def __init__(self, size=None, resize_short=None, interpolation=-1):
-        self.interpolation = interpolation if interpolation >= 0 else None
+    def __init__(self,
+                 size=None,
+                 resize_short=None,
+                 interpolation=None,
+                 backend="cv2"):
        if resize_short is not None and resize_short > 0:
            self.resize_short = resize_short
            self.w = None
@ -101,6 +150,9 @@ class ResizeImage(object):
            raise OperatorParamError("invalid params for ReisizeImage for '\
                'both 'size' and 'resize_short' are None")

+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
+
    def __call__(self, img):
        img_h, img_w = img.shape[:2]
        if self.resize_short is not None:
@ -110,10 +162,7 @@ class ResizeImage(object):
        else:
            w = self.w
            h = self.h
-        if self.interpolation is None:
-            return cv2.resize(img, (w, h))
-        else:
-            return cv2.resize(img, (w, h), interpolation=self.interpolation)
+        return self._resize_func(img, (w, h))


 class CropImage(object):
@ -145,9 +194,12 @@ class CropImage(object):
 class RandCropImage(object):
    """ random crop image """

-    def __init__(self, size, scale=None, ratio=None, interpolation=-1):
-
-        self.interpolation = interpolation if interpolation >= 0 else None
+    def __init__(self,
+                 size,
+                 scale=None,
+                 ratio=None,
+                 interpolation=None,
+                 backend="cv2"):
        if type(size) is int:
            self.size = (size, size)  # (h, w)
        else:
@ -156,6 +208,9 @@ class RandCropImage(object):
        self.scale = [0.08, 1.0] if scale is None else scale
        self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio

+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
+
    def __call__(self, img):
        size = self.size
        scale = self.scale
@ -181,10 +236,8 @@ class RandCropImage(object):
        j = random.randint(0, img_h - h)

        img = img[j:j + h, i:i + w, :]
-        if self.interpolation is None:
-            return cv2.resize(img, size)
-        else:
-            return cv2.resize(img, size, interpolation=self.interpolation)
+
+        return self._resize_func(img, size)


 class RandFlipImage(object):
--- a/deploy/shell/predict.sh
+++ b/deploy/shell/predict.sh
@ -1,6 +1,9 @@
 # classification
 python3.7 python/predict_cls.py -c configs/inference_cls.yaml

+# multilabel_classification
+#python3.7 python/predict_cls.py -c configs/inference_multilabel_cls.yaml
+
 # feature extractor
 # python3.7 python/predict_rec.py -c configs/inference_rec.yaml

--- a/docs/en/ImageNet_models_en.md
+++ b/docs/en/ImageNet_models_en.md
@ -24,13 +24,13 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a
 * Server-side distillation pretrained models

 | Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | Download Address                                                                                         |
-|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
+|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|
 | ResNet34_vd_ssld         | 0.797    | 0.760  | 0.037  | 2.434               | 6.222              | 7.39     | 21.82     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)         |
-| ResNet50_vd_<br>ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
-| ResNet101_vd_<br>ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
-| Res2Net50_vd_<br>26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
-| Res2Net101_vd_<br>26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
-| Res2Net200_vd_<br>26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
+| ResNet50_vd_ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet101_vd_ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
+| Res2Net50_vd_26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net101_vd_26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
+| Res2Net200_vd_26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
 | HRNet_W18_C_ssld | 0.812    | 0.769   | 0.043 | 7.406          | 13.297         | 4.14     | 21.29     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
 | HRNet_W48_C_ssld | 0.836    | 0.790   | 0.046  | 13.707         | 34.435         | 34.58    | 77.47     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) |
 | SE_HRNet_W64_C_ssld | 0.848    |  -    |  - |  31.697      |     94.995      | 57.83    | 128.97    | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) |
@ -38,19 +38,44 @@ Accuracy and inference time of the prtrained models based on SSLD distillation a

 * Mobile-side distillation pretrained models

-| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | Download Address  |
+| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | Storage Size(M) | Download Address  |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
-| MobileNetV1_<br>ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
-| MobileNetV2_<br>ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
-| MobileNetV3_small_<br>x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
-| GhostNet_<br>x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)  
+| MobileNetV1_ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
+| MobileNetV2_ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
+| MobileNetV3_small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| MobileNetV3_large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_small_x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
+| GhostNet_x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)  
+
+* Intel-CPU-side distillation pretrained models
+
+| Model                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain |  Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | Flops(M) | Params(M)  | Download Address   |
+|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
+| PPLCNet_x0_5_ssld   | 0.661    | 0.631    | 0.030 | 2.05     | 47     |   1.9   | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams)                 |
+| PPLCNet_x1_0_ssld   | 0.744    | 0.713    | 0.033 | 2.46     | 161     |   3.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams)                 |
+| PPLCNet_x2_5_ssld   | 0.808    | 0.766    | 0.042 | 5.39     | 906     |   9.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams)                 |


 * Note: `Reference Top-1 Acc` means accuracy of pretrained models which are trained on ImageNet1k dataset.


+<a name="PPLCNet_series"></a>
+### PPLCNet_series
+
+Accuracy and inference time metrics of PPLCNet series models are shown as follows. More detailed information can be refered to [PPLCNet series tutorial](../en/models/PPLCNet_en.md).
+
+| Model           | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | Download Address |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565   |  1.74      | 18    | 1.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) |
+| PPLCNet_x0_35        |0.5809           | 0.8083   |  1.92      | 29    | 1.6  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) |
+| PPLCNet_x0_5         |0.6314           | 0.8466   |  2.05      | 47    | 1.9  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) |
+| PPLCNet_x0_75        |0.6818           | 0.8830   |  2.29      | 99    | 2.4  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) |
+| PPLCNet_x1_0         |0.7132           | 0.9003   |  2.46      | 161   | 3.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) |
+| PPLCNet_x1_5         |0.7371           | 0.9153   |  3.19      | 342   | 4.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) |
+| PPLCNet_x2_0         |0.7518           | 0.9227   |  4.27      | 590   | 6.5  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) |
+| PPLCNet_x2_5         |0.7660           | 0.9300   |  5.39      | 906   | 9.0  | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) |
+
+
 <a name="ResNet_and_Vd_series"></a>
 ### ResNet and Vd series

--- a/docs/en/advanced_tutorials/multilabel/multilabel_en.md
+++ b/docs/en/advanced_tutorials/multilabel/multilabel_en.md
@ -25,58 +25,68 @@ tar -xf NUS-SCENE-dataset.tar
 cd ../../
 ```

-## Environment
-
-### Download pretrained model
-
-You can use the following commands to download the pretrained model of ResNet50_vd.
-
-```bash
-mkdir pretrained
-cd pretrained
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
-cd ../
-```
-
 ## Training

 ```shell
-export CUDA_VISIBLE_DEVICES=0
-python -m paddle.distributed.launch \
-    --gpus="0" \
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/ResNet50_vd_multilabel.yaml
+        -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
 ```

-After training for 10 epochs, the best accuracy over the validation set should be around 0.72.
+After training for 10 epochs, the best accuracy over the validation set should be around 0.95.

 ## Evaluation

 ```bash
 python tools/eval.py \
-    -c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
-    -o pretrained_model="./output/ResNet50_vd/best_model/ppcls" \
-    -o load_static_weights=False
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```

-The metric of evaluation is based on mAP, which is commonly used in multilabel task to show model perfermance. The mAP over validation set should be around 0.57.
-
 ## Prediction

 ```bash
-python tools/infer/infer.py \
-    -i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
-    --model ResNet50_vd \
-    --pretrained_model "./output/ResNet50_vd/best_model/ppcls" \
-    --use_gpu True \
-    --load_static_weights False \
-    --multilabel True \
-    --class_num 33
+python3 tools/infer.py
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```

 You will get multiple output such as the following:
-```    
-    class id: 3, probability: 0.6025
-    class id: 23, probability: 0.5491
-    class id: 32, probability: 0.7006
-```
+```
+[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]  
+```
+
+## Prediction based on prediction engine
+
+### Export model
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
+```
+
+The default path of the inference model is under the current path `./inference`
+
+### Prediction based on prediction engine
+
+Enter the deploy directory:
+
+```bash
+cd ./deploy
+```
+
+Prediction based on prediction engine:
+
+```
+python3 python/predict_cls.py \
+     -c configs/inference_multilabel_cls.yaml
+```
+
+You will get multiple output such as the following:
+
+```
+0517_2715693311.jpg:    class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
+```
--- a/docs/en/application/feature_learning_en.md
+++ b/docs/en/application/feature_learning_en.md
@ -22,9 +22,8 @@ The feature learning config file description can be found in [yaml description](

 The following are the pretrained models trained on different dataset.

- Vehicle Fine-Grained Classification：[CompCars](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.1_pretrained.pdparams)
- Vehicle ReID：[VERI-Wild](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.0_pretrained.pdparams)
+- Vehicle Fine-Grained Classification：[CompCars](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.2_pretrained.pdparams)
+- Vehicle ReID：[VERI-Wild](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.1_pretrained.pdparams)
 - Cartoon Character Recognition：[iCartoon](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/cartoon_rec_ResNet50_iCartoon_v1.0_pretrained.pdparams)
- Logo Recognition：[Logo 3K](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/logo_rec_ResNet50_Logo3K_v1.0_pretrained.pdparams)
- Product Recognition： [Inshop](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Inshop_pretrained_v1.0.pdparams)、[Aliproduct](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams) 
-
+- Logo Recognition：[Logo 3K](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/logo_rec_ResNet50_Logo3K_v1.1_pretrained.pdparams)
+- Product Recognition： [Inshop](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Inshop_pretrained_v1.1.pdparams)、[Aliproduct](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams)
--- a/docs/en/models/PPLCNet_en.md
+++ b/docs/en/models/PPLCNet_en.md
@ -0,0 +1,41 @@
+# PPLCNet series
+
+## Overview
+
+The PPLCNet series is a network that has excellent performance on Intel-CPU proposed by the Baidu PaddleCV team. The author summarizes some methods that can improve the accuracy of the model on Intel-CPU but hardly increase the inference time. The author combines these methods into a new network, namely PPLCNet. Compared with other lightweight networks, PPLCNet can achieve higher accuracy with the same inference time. PPLCNet has shown strong competitiveness in image classification, object detection, and semantic segmentation.
+
+
+
+## Accuracy, FLOPS and Parameters
+
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+
+
+
+## Inference speed based on Intel(R)-Xeon(R)-Gold-6148-CPU
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/en/tutorials/getting_started_en.md
+++ b/docs/en/tutorials/getting_started_en.md
@ -14,13 +14,13 @@ After preparing the configuration file, The training process can be started in t

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="" \
-    -o use_gpu=False
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Arch.pretrained=False \
+    -o Global.device=gpu
 ```

-Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o pretrained_model=""` means to not using pre-trained models.
-`-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`.
+Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o Arch.pretrained=False` means to not using pre-trained models.
+`-o Global.device=gpu` means to use GPU for training. If you want to use the CPU for training, you need to set `Global.device` to `cpu`.


 Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_description_en.md).
@ -54,12 +54,12 @@ After configuring the configuration file, you can finetune it by loading the pre

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained" \
-    -o use_gpu=True
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Arch.pretrained=True \
+    -o Global.device=gpu
 ```

-Among them, `-o pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+Among them, `-o Arch.pretrained` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. You can also set it into `True` to use pretrained weights that trained in ImageNet1k.

 We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](../models/models_intro_en.md).

@ -69,28 +69,26 @@ If the training process is terminated for some reasons, you can also load the ch

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-    -o last_epoch=5 \
-    -o use_gpu=True
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
+    -o Global.device=gpu
 ```

-The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
+The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.

 **Note**:
-* The parameter `-o last_epoch=5` means to record the number of the last training epoch as `5`, that is, the number of this training epoch starts from `6`, , and the parameter defaults to `-1`, which means the number of this training epoch starts from `0`.

-* The `-o checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `checkpoints` to `./output/MobileNetV3_large_x1_0_gpupaddle/5/ppcls`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
+* The `-o Global.checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `Global.checkpoints` to `../output/MobileNetV3_large_x1_0/epoch_5`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.

    ```shell
-    output/
-    └── MobileNetV3_large_x1_0
-        ├── 0
-        │   ├── ppcls.pdopt
-        │   └── ppcls.pdparams
-        ├── 1
-        │   ├── ppcls.pdopt
-        │   └── ppcls.pdparams
+    output
+    ├── MobileNetV3_large_x1_0
+    │   ├── best_model.pdopt
+    │   ├── best_model.pdparams
+    │   ├── best_model.pdstates
+    │   ├── epoch_1.pdopt
+    │   ├── epoch_1.pdparams
+    │   ├── epoch_1.pdstates
        .
        .
        .
@ -103,18 +101,15 @@ The model evaluation process can be started as follows.

 ```bash
 python tools/eval.py \
-    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-    -o load_static_weights=False
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

-The above command will use `./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model/ppcls`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
+The above command will use `./configs/quick_start/MobileNetV3_large_x1_0.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.

 Some of the configurable evaluation parameters are described as follows:
-* `ARCHITECTURE.name`: Model name
-* `pretrained_model`: The path of the model file to be evaluated
-* `load_static_weights`: Whether the model to be evaluated is a static graph model
-
+* `Arch.name`: Model name
+* `Global.pretrained_model`: The path of the model file to be evaluated

 **Note:** If the model is a dygraph type, you only need to specify the prefix of the model file when loading the model, instead of specifying the suffix, such as [1.3 Resume Training](#13-resume-training).

@ -125,26 +120,15 @@ If you want to run PaddleClas on Linux with GPU, it is highly recommended to use

 ### 2.1 Model training

-After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `selected_gpus`:
+After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `gpus`:

 ```bash
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
-```
-
-The configuration can be updated by adding the `-o` parameter.
-
-```bash
-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
-    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o pretrained_model="" \
-        -o use_gpu=True
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml
 ```

 The format of output log information is the same as above, see [1.1 Model training](#11-model-training) for details.
@ -156,14 +140,14 @@ After configuring the configuration file, you can finetune it by loading the pre
 ```
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained"
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Arch.pretrained=True
 ```

-Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+Among them, `Arch.pretrained` is set to `True` or `False`. It also can be used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.

 There contains a lot of examples of model finetuning in [Quick Start](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset.

@ -175,26 +159,26 @@ If the training process is terminated for some reasons, you can also load the ch
 ```
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-        -o last_epoch=5 \
-        -o use_gpu=True
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
+        -o Global.device=gpu
 ```

-The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. About `last_epoch` parameter, please refer [1.3 Resume training](#13-resume-training) for details.
+The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter as described in [1.3 Resume training](#13-resume-training).

 ### 2.4 Model evaluation

 The model evaluation process can be started as follows.

 ```bash
-python tools/eval.py \
-    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-    -o load_static_weights=False
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    tools/eval.py \
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

 About parameter description, see [1.4 Model evaluation](#14-model-evaluation) for details.
@ -204,30 +188,16 @@ About parameter description, see [1.4 Model evaluation](#14-model-evaluation) fo
 After the training is completed, you can predict by using the pre-trained model obtained by the training, as follows:

 ```python
-python tools/infer/infer.py \
-    -i image path \
-    --model MobileNetV3_large_x1_0 \
-    --pretrained_model "./output/MobileNetV3_large_x1_0/best_model/ppcls" \
-    --use_gpu True \
-    --load_static_weights False
+python3 tools/infer.py \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \
+    -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

 Among them:
-+ `image_file`(i): The path of the image file to be predicted, such as `./test.jpeg`;
-+ `model`: Model name, such as `MobileNetV3_large_x1_0`;
-+ `pretrained_model`: Weight file path, such as `./pretrained/MobileNetV3_large_x1_0_pretrained/`;
-+ `use_gpu`: Whether to use the GPU, default by `True`;
-+ `load_static_weights`: Whether to load the pre-trained model obtained from static image training, default by `False`;
-+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
-+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
-+ `pre_label_image`: Whether to pre-label the image data, default value: `False`;
-+ `pre_label_out_idr`: The output path of pre-labeled image data. When `pre_label_image=True`, a lot of subfolders will be generated under the path, each subfolder represent a category, which stores all the images predicted by the model to belong to the category.
+ `Infer.infer_imgs`: The path of the image file or folder to be predicted;
+ `Global.pretrained_model`: Weight file path, such as `./output/MobileNetV3_large_x1_0/best_model`;

-**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
-
-About more detailed infomation, you can refer to [infer.py](../../../tools/infer/infer.py).
-
-<a name="model_inference"></a>
 ## 4. Use the inference model to predict

 PaddlePaddle supports inference using prediction engines, which will be introduced next.
@ -235,41 +205,38 @@ PaddlePaddle supports inference using prediction engines, which will be introduc
 Firstly, you should export inference model using `tools/export_model.py`.

 ```bash
-python tools/export_model.py \
-    --model MobileNetV3_large_x1_0 \
-    --pretrained_model ./output/MobileNetV3_large_x1_0/best_model/ppcls \
-    --output_path ./inference \
-    --class_dim 1000
+python3 tools/export_model.py \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.pretrained_model=output/MobileNetV3_large_x1_0/best_model
 ```

-Among them, the `--model` parameter is used to specify the model name, `--pretrained_model` parameter is used to specify the model file path, the path does not need to include the model file suffix name, and `--output_path` is used to specify the storage path of the converted model, class_dim means number of class for the model, default as 1000.
-
-**Note**:
-1. If `--output_path=./inference`, then three files will be generated in the folder `inference`, they are `inference.pdiparams`, `inference.pdmodel` and `inference.pdiparams.info`.
-2. You can specify the `shape` of the model input image by setting the parameter `--img_size`, the default is `224`, which means the shape of input image is `224*224`. If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, you need to set `--img_size=384`.
+Among them,  `Global.pretrained_model` parameter is used to specify the model file path that does not need to include the file suffix name.

 The above command will generate the model structure file (`inference.pdmodel`) and the model weight file (`inference.pdiparams`), and then the inference engine can be used for inference:

+Go to the deploy directory:
+
+```
+cd deploy
+```
+
+Using inference engine to inference. Because the mapping file of ImageNet1k dataset is used by default, we should set `PostProcess.Topk.class_id_map_file` into `None`.
+
 ```bash
-python tools/infer/predict.py \
-    --image_file image path \
-    --model_file "./inference/inference.pdmodel" \
-    --params_file "./inference/inference.pdiparams" \
-    --use_gpu=True \
-    --use_tensorrt=False
+python3 python/predict_cls.py \
+    -c configs/inference_cls.yaml \
+    -o Global.infer_imgs=../dataset/flowers102/jpg/image_00001.jpg \
+    -o Global.inference_model_dir=../inference/ \
+    -o PostProcess.Topk.class_id_map_file=None
 ```
 Among them:
-+ `image_file`: The path of the image file to be predicted, such as `./test.jpeg`;
-+ `model_file`: Model file path, such as `./MobileNetV3_large_x1_0/inference.pdmodel`;
-+ `params_file`: Weight file path, such as `./MobileNetV3_large_x1_0/inference.pdiparams`;
-+ `use_tensorrt`: Whether to use the TesorRT, default by `True`;
-+ `use_gpu`: Whether to use the GPU, default by `True`
-+ `enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. When both `use_gpu` and `enable_mkldnn` are set to `True`, GPU is used to run and `enable_mkldnn` will be ignored.
-+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
-+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
-+ `enable_calc_topk`: Whether to calculate top-k accuracy of the predction, default by `False`. Top-k accuracy will be printed out when set as `True`.
-+ `gt_label_path`: Image name and label file, used when `enable_calc_topk` is `True` to get image list and labels.
+ `Global.infer_imgs`: The path of the image file to be predicted;
+ `Global.inference_model_dir`: Model structure file path, such as `../inference/inference.pdmodel`;
+ `Global.use_tensorrt`: Whether to use the TesorRT, default by `False`;
+ `Global.use_gpu`: Whether to use the GPU, default by `True`
+ `Global.enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. It is valid when `Global.use_gpu` is `False`.
+ `Global.use_fp16`: Whether to enable FP16, default by `False`;

 **Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.

-If you want to evaluate the speed of the model, it is recommended to use [predict.py](../../../tools/infer/predict.py), and enable TensorRT to accelerate.
+If you want to evaluate the speed of the model, it is recommended to enable TensorRT to accelerate for GPU, and MKL-DNN for CPU.
--- a/docs/en/tutorials/getting_started_retrieval_en.md
+++ b/docs/en/tutorials/getting_started_retrieval_en.md
@ -120,7 +120,7 @@ python3 tools/train.py \

 `-c` is used to specify the path to the configuration file, and `-o` is used to specify the parameters that need to be modified or added, where `-o Arch.Backbone.pretrained=True` indicates that the Backbone part uses the pre-trained model, in addition, `Arch.Backbone.pretrained` can also specify backbone.`pretrained` can also specify the address of a specific model weight file, which needs to be replaced with the path to your own pre-trained model weight file when using it. `-o Global.device=gpu` indicates that the GPU is used for training. If you want to use a CPU for training, you need to set `Global.device` to `cpu`.

-For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_en.md) for specific configuration parameters.
+For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_description_en.md) for specific configuration parameters.

 Run the above commands to check the output log, an example is as follows:

--- a/docs/en/tutorials/quick_start_recognition_en.md
+++ b/docs/en/tutorials/quick_start_recognition_en.md
@ -43,6 +43,11 @@ The detection model with the recognition inference model for the 4 directions (L
 | Product Recignition Model | Product Scenario  |  [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
 | Vehicle ReID Model | Vehicle ReID Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | - | - |

+| Models Introduction       | Recommended Scenarios   | inference Model  | Predict Config File  | Config File to Build Index Database |
+| ------------  | ------------- | -------- | ------- | -------- |
+| Lightweight generic mainbody detection model | General Scenarios  |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | - | - |
+| Lightweight generic recognition model | General Scenarios  | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
+

 Demo data in this tutorial can be downloaded here: [download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar).

@ -50,6 +55,7 @@ Demo data in this tutorial can be downloaded here: [download link](https://paddl
 **Attention**
 1. If you do not have wget installed on Windows, you can download the model by copying the link into your browser and unzipping it in the appropriate folder; for Linux or macOS users, you can right-click and copy the download link to download it via the `wget` command.
 2. If you want to install `wget` on macOS, you can run the following command.
+3. The predict config file of the lightweight generic recognition model and the config file to build index database are used for the config of product recognition model of server-side. You can modify the path of the model to complete the index building and prediction.

 ```shell
 # install homebrew
@ -123,6 +129,13 @@ The `models` folder should have the following file structure.
 │   └── inference.pdmodel
 ```

+**Attention**
+If you want to use the lightweight generic recognition model, you need to re-extract the features of the demo data and re-build the index. The way is as follows:
+
+```shell
+python3.7 python/build_gallery.py -c configs/build_product.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer
+```
+
 <a name="Product_recognition_and_retrival"></a>
 ### 2.2 Product Recognition and Retrieval

--- a/docs/images/wx_group.png
+++ b/docs/images/wx_group.png
--- a/docs/zh_CN/ImageNet_models_cn.md
+++ b/docs/zh_CN/ImageNet_models_cn.md
@ -31,9 +31,9 @@
 | 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                                                         |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
 | ResNet34_vd_ssld         | 0.797    | 0.760  | 0.037  | 2.434               | 6.222              | 7.39     | 21.82     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)         |
-| ResNet50_vd_<br>ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
-| ResNet101_vd_<br>ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
-| Res2Net50_vd_<br>26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
+| ResNet50_vd_ssld | 0.830    | 0.792    | 0.039 | 3.531               | 8.090              | 8.67     | 25.58     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) |
+| ResNet101_vd_ssld   | 0.837    | 0.802    | 0.035 |  6.117               | 13.762             | 16.1     | 44.57     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams)   |
+| Res2Net50_vd_26w_4s_ssld | 0.831    | 0.798    | 0.033 |  4.527              | 9.657             | 8.37     | 25.06     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) |
 | Res2Net101_vd_<br>26w_4s_ssld | 0.839    | 0.806    | 0.033 | 8.087              | 17.312             | 16.67    | 45.22     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) |
 | Res2Net200_vd_<br>26w_4s_ssld | 0.851    | 0.812    | 0.049 | 14.678              | 32.350             | 31.49    | 76.21     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) |
 | HRNet_W18_C_ssld | 0.812    | 0.769   | 0.043 | 7.406          | 13.297         | 4.14     | 21.29     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) |
@ -45,16 +45,44 @@

 | 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1 | Flops(G) | Params(M) | 模型大小(M) | 下载地址   |
 |---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|
-| MobileNetV1_<br>ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
-| MobileNetV2_<br>ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
-| MobileNetV3_<br>small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
-| MobileNetV3_<br>large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
-| MobileNetV3_small_<br>x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
-| GhostNet_<br>x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)               |
+| MobileNetV1_ssld   | 0.779    | 0.710    | 0.069 |  32.523              | 1.11     | 4.19      | 16      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams)                 |
+| MobileNetV2_ssld                 | 0.767    | 0.722  | 0.045  | 23.318              | 0.6      | 3.44      | 14      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams)                 |
+| MobileNetV3_small_x0_35_ssld          | 0.556    | 0.530 | 0.026   | 2.635                 | 0.026    | 1.66      | 6.9     | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams)          |
+| MobileNetV3_large_x1_0_ssld      | 0.790    | 0.753  | 0.036  | 19.308           | 0.45     | 5.47      | 21      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams)      |
+| MobileNetV3_small_x1_0_ssld      | 0.713    | 0.682  |  0.031  | 6.546                 | 0.123    | 2.94      | 12      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams)      |
+| GhostNet_x1_3_ssld                    | 0.794    | 0.757   | 0.037 | 19.983                | 0.44     | 7.3       | 29      | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams)               |
+
+
+* Intel CPU端知识蒸馏模型
+
+| 模型                  | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain |  Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | Flops(M) | Params(M)  | 下载地址   |
+|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|
+| PPLCNet_x0_5_ssld   | 0.661    | 0.631    | 0.030 | 2.05     | 47     |   1.9   | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams)                 |
+| PPLCNet_x1_0_ssld   | 0.744    | 0.713    | 0.033 | 2.46     | 161     |   3.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams)                 |
+| PPLCNet_x2_5_ssld   | 0.808    | 0.766    | 0.042 | 5.39     | 906     |   9.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams)                 |
+
+


 * 注: `Reference Top-1 Acc`表示PaddleClas基于ImageNet1k数据集训练得到的预训练模型精度。

+<a name="PPLCNet系列"></a>
+### PPLCNet系列
+
+PPLCNet系列模型的精度、速度指标如下表所示，更多关于该系列的模型介绍可以参考：[PPLCNet系列模型文档](./models/PPLCNet.md)。
+
+| 模型           | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | 下载地址 |
+|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565   |  1.74      | 18    | 1.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) |
+| PPLCNet_x0_35        |0.5809           | 0.8083   |  1.92      | 29    | 1.6  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) |
+| PPLCNet_x0_5         |0.6314           | 0.8466   |  2.05      | 47    | 1.9  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) |
+| PPLCNet_x0_75        |0.6818           | 0.8830   |  2.29      | 99    | 2.4  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) |
+| PPLCNet_x1_0         |0.7132           | 0.9003   |  2.46      | 161   | 3.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) |
+| PPLCNet_x1_5         |0.7371           | 0.9153   |  3.19      | 342   | 4.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) |
+| PPLCNet_x2_0         |0.7518           | 0.9227   |  4.27      | 590   | 6.5  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) |
+| PPLCNet_x2_5         |0.7660           | 0.9300   |  5.39      | 906   | 9.0  | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) |
+
+
 <a name="ResNet及其Vd系列"></a>
 ### ResNet及其Vd系列

@ -429,7 +457,7 @@ ViT（Vision Transformer）与DeiT（Data-efficient Image Transformers）系列

 | 模型       | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | Flops(G) | Params(M) | 下载地址                                                     |
 | ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ |
-| TNT_small | 0.8121   |0.9563  |                  |                  | 5.2   |  23.8    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) |               |   
+| TNT_small | 0.8121   |0.9563  |                  |                  | 5.2   |  23.8    | [下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) |               |  

 **注**：TNT模型的数据预处理部分`NormalizeImage`中的`mean`与`std`均为0.5。

--- a/docs/zh_CN/advanced_tutorials/multilabel/multilabel.md
+++ b/docs/zh_CN/advanced_tutorials/multilabel/multilabel.md
@ -25,58 +25,66 @@ tar -xf NUS-SCENE-dataset.tar
 cd ../../
 ```

-## 二、环境准备
-
-### 2.1 下载预训练模型
-
-本例展示基于ResNet50_vd模型的多标签分类流程，因此首先下载ResNet50_vd的预训练模型
-
-```bash
-mkdir pretrained
-cd pretrained
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_pretrained.pdparams
-cd ../
-```
-
-## 三、模型训练
+## 二、模型训练

 ```shell
-export CUDA_VISIBLE_DEVICES=0
-python -m paddle.distributed.launch \
-    --gpus="0" \
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/ResNet50_vd_multilabel.yaml
+        -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
 ```

-训练10epoch之后，验证集最好的正确率应该在0.72左右。
+训练10epoch之后，验证集最好的正确率应该在0.95左右。

-## 四、模型评估
+## 三、模型评估

 ```bash
-python tools/eval.py \
-    -c ./configs/quick_start/ResNet50_vd_multilabel.yaml \
-    -o pretrained_model="./output/ResNet50_vd/best_model/ppcls" \
-    -o load_static_weights=False
+python3 tools/eval.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```

-评估指标采用mAP，验证集的mAP应该在0.57左右。
-
-## 五、模型预测
+## 四、模型预测

 ```bash
-python tools/infer/infer.py \
-    -i "./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/0199_434752251.jpg" \
-    --model ResNet50_vd \
-    --pretrained_model "./output/ResNet50_vd/best_model/ppcls" \
-    --use_gpu True \
-    --load_static_weights False \
-    --multilabel True \
-    --class_num 33
+python3 tools/infer.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
 ```

 得到类似下面的输出：
-```    
-    class id: 3, probability: 0.6025
-    class id: 23, probability: 0.5491
-    class id: 32, probability: 0.7006
-```
+```  
+[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]
+```
+
+## 五、基于预测引擎预测
+
+### 5.1 导出inference model
+
+```bash
+python3 tools/export_model.py \
+    -c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
+    -o Arch.pretrained="./output/MobileNetV1/best_model"
+```
+inference model的路径默认在当前路径下`./inference`
+
+### 5.2 基于预测引擎预测
+
+首先进入deploy目录下：
+
+```bash
+cd ./deploy
+```
+
+通过预测引擎推理预测：
+
+```
+python3 python/predict_cls.py \
+     -c configs/inference_multilabel_cls.yaml
+```
+
+得到类似下面的输出：
+```
+0517_2715693311.jpg:    class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
+```
--- a/docs/zh_CN/application/feature_learning.md
+++ b/docs/zh_CN/application/feature_learning.md
@ -22,8 +22,8 @@

 以下为各应用在不同数据集下的预训练模型

- 车辆细分类：[CompCars](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.1_pretrained.pdparams)
- 车辆ReID：[VERI-Wild](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.0_pretrained.pdparams)
+- 车辆细分类：[CompCars](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.2_pretrained.pdparams)
+- 车辆ReID：[VERI-Wild](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.1_pretrained.pdparams)
 - 动漫人物识别：[iCartoon](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/cartoon_rec_ResNet50_iCartoon_v1.0_pretrained.pdparams)
- Logo识别：[Logo3K](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/logo_rec_ResNet50_Logo3K_v1.0_pretrained.pdparams)
- 商品识别： [Inshop](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Inshop_pretrained_v1.0.pdparams)、[Aliproduct](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams) 
+- Logo识别：[Logo3K](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/logo_rec_ResNet50_Logo3K_v1.1_pretrained.pdparams)
+- 商品识别： [Inshop](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Inshop_pretrained_v1.1.pdparams)、[Aliproduct](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams)
--- a/docs/zh_CN/faq_series/faq_2021_s2.md
+++ b/docs/zh_CN/faq_series/faq_2021_s2.md
@ -7,7 +7,7 @@
 * 图像分类、识别、检索领域大佬众多，模型和论文更新速度也很快，本文档回答主要依赖有限的项目实践，难免挂一漏万，如有遗漏和不足，也希望有识之士帮忙补充和修正，万分感谢。

 ## 目录
-* [近期更新](#近期更新)(2021.08.11)
+* [近期更新](#近期更新)(2021.09.08)
 * [精选](#精选)
 * [1. 理论篇](#1.理论篇)
    * [1.1 PaddleClas基础知识](#1.1PaddleClas基础知识)
@ -27,60 +27,69 @@
 <a name="近期更新"></a>
 ## 近期更新

-#### Q2.6.2: 导出inference模型进行预测部署，准确率异常，为什么呢？
-**A**: 该问题通常是由于在导出时未能正确加载模型参数导致的，首先检查模型导出时的日志，是否存在类似下述内容：
+#### Q2.1.7: 在训练时，出现如下报错信息：`ERROR: Unexpected segmentation fault encountered in DataLoader workers.`，如何排查解决问题呢？
+**A**：尝试将训练配置文件中的字段 `num_workers` 设置为 `0`；尝试将训练配置文件中的字段 `batch_size` 调小一些；检查数据集格式和配置文件中的数据集路径是否正确。
+
+#### Q2.1.8: 如何在训练时使用 `Mixup` 和 `Cutmix` ？
+**A**：
+* `Mixup` 的使用方法请参考 [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml#L63-L65)；`Cuxmix` 请参考 [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65)。
+
+* 在使用 `Mixup` 或 `Cutmix` 时，需要注意：
+    * 配置文件中的 `Loss.Tranin.CELoss` 需要修改为 `Loss.Tranin.MixCELoss`，可参考 [MixCELoss](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L23-L26)；
+    * 使用 `Mixup` 或 `Cutmix` 做训练时无法计算训练的精度（Acc）指标，因此需要在配置文件中取消 `Metric.Train.TopkAcc` 字段，可参考 [Metric.Train.TopkAcc](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128)。
+
+#### Q2.1.9: 训练配置yaml文件中，字段 `Global.pretrain_model` 和 `Global.checkpoints` 分别用于配置什么呢？
+**A**：
+* 当需要 `fine-tune` 时，可以通过字段 `Global.pretrain_model` 配置预训练模型权重文件的路径，预训练模型权重文件后缀名通常为 `.pdparams`；
+* 在训练过程中，训练程序会自动保存每个epoch结束时的断点信息，包括优化器信息 `.pdopt` 和模型权重信息 `.pdparams`。在训练过程意外中断等情况下，需要恢复训练时，可以通过字段 `Global.checkpoints` 配置训练过程中保存的断点信息文件，例如通过配置 `checkpoints: ./output/ResNet18/epoch_18` 即可恢复18epoch训练结束时的断点信息，PaddleClas将自动加载 `epoch_18.pdopt` 和 `epoch_18.pdparams`，从19epoch继续训练。
+
+#### Q2.6.3: 如何将模型转为 `ONNX` 格式？
+**A**：Paddle支持两种转ONNX格式模型的方式，且依赖于 `paddle2onnx` 工具，首先需要安装 `paddle2onnx`：
+
+```shell
+pip install paddle2onnx
 ```
-UserWarning: Skip loading for ***. *** is not found in the provided dict.
-```
-如果存在，则说明模型权重未能加载成功，请进一步检查配置文件中的 `Global.pretrained_model` 字段，是否正确配置了模型权重文件的路径。模型权重文件后缀名通常为 `pdparams`，注意在配置该路径时无需填写文件后缀名。

-#### Q2.1.4: 数据预处理中，不想对输入数据进行裁剪，该如何设置？或者如何设置剪裁的尺寸。
-**A**: PaddleClas 支持的数据预处理算子可在这里查看：`ppcls/data/preprocess/__init__.py`，所有支持的算子均可在配置文件中进行配置，配置的算子名称需要和算子类名一致，参数与对应算子类的构造函数参数一致。如不需要对图像裁剪，则可去掉 `CropImage`、`RandCropImage`，使用 `ResizeImage` 替换即可，可通过其参数设置不同的resize方式， 使用 `size` 参数则直接将图像缩放至固定大小，使用`resize_short` 参数则会维持图像宽高比进行缩放。设置裁剪尺寸时，可通过 `CropImage` 算子的 `size` 参数，或 `RandCropImage` 算子的 `size` 参数。
+* 从 inference model 转为 ONNX 格式模型：

-#### Q1.1.3: Momentum 优化器中的 momentum 参数是什么意思呢？
-**A**: Momentum 优化器是在 SGD 优化器的基础上引入了“动量”的概念。在 SGD 优化器中，在 `t+1` 时刻，参数 `w` 的更新可表示为：
-```latex
-w_t+1 = w_t - lr * grad
-```
-其中，`lr` 为学习率，`grad` 为此时参数 `w` 的梯度。在引入动量的概念后，参数 `w` 的更新可表示为：
-```latex
-v_t+1 = m * v_t + lr * grad
-w_t+1 = w_t - v_t+1
-```
-其中，`m` 即为动量 `momentum`，表示累积动量的加权值，一般取 `0.9`，当取值小于 `1` 时，则越早期的梯度对当前的影响越小，例如，当动量参数 `m` 取 `0.9` 时，在 `t` 时刻，`t-5` 的梯度加权值为 `0.9 ^ 5 = 0.59049`，而 `t-2` 时刻的梯度加权值为 `0.9 ^ 2 = 0.81`。因此，太过“久远”的梯度信息对当前的参考意义很小，而“最近”的历史梯度信息对当前影响更大，这也是符合直觉的。
+    以动态图导出的 `combined` 格式 inference model（包含 `.pdmodel` 和 `.pdiparams` 两个文件）为例，使用以下命令进行模型格式转换：
+    ```shell
+    paddle2onnx --model_dir ${model_path}  --model_filename  ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
+    ```
+    上述命令中：
+    * `model_dir`：该参数下需要包含 `.pdmodel` 和 `.pdiparams` 两个文件；
+    * `model_filename`：该参数用于指定参数 `model_dir` 下的 `.pdmodel` 文件路径；
+    * `params_filename`：该参数用于指定参数 `model_dir` 下的 `.pdiparams` 文件路径；
+    * `save_file`：该参数用于指定转换后的模型保存目录路径。

-<div align="center">
-    <img src="../../images/faq/momentum.jpeg" width="400">
-</div>
+    关于静态图导出的非 `combined` 格式的 inference model（通常包含文件 `__model__` 和多个参数文件）转换模型格式，以及更多参数说明请参考 paddle2onnx 官方文档 [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#%E5%8F%82%E6%95%B0%E9%80%89%E9%A1%B9)。

-*该图来自 `https://blog.csdn.net/tsyccnh/article/details/76270707`*
+* 直接从模型组网代码导出ONNX格式模型：

-通过引入动量的概念，在参数更新时考虑了历史更新的影响，因此可以加快收敛速度，也改善了 `SGD` 优化器带来的损失（cost、loss）震荡问题。
+    以动态图模型组网代码为例，模型类为继承于 `paddle.nn.Layer` 的子类，代码如下所示：

-#### Q1.1.4: PaddleClas 是否有 `Fixing the train-test resolution discrepancy` 这篇论文的实现呢？
-**A**: 目前 PaddleClas 没有实现。如果需要，可以尝试自己修改代码。简单来说，该论文所提出的思想是使用较大分辨率作为输入，对已经训练好的模型最后的FC层进行fine-tune。具体操作上，首先在较低分辨率的数据集上对模型网络进行训练，完成训练后，对网络除最后的FC层外的其他层的权重设置参数 `stop_gradient=True`，然后使用较大分辨率的输入对网络进行fine-tune训练。
+    ```python
+    import paddle
+    from paddle.static import InputSpec

-#### Q1.6.2: PaddleClas 图像识别用于 Eval 的配置文件中，`Query` 和 `Gallery` 配置具体是用于做什么呢？
-**A**: `Query` 与 `Gallery` 均为数据集配置，其中 `Gallery` 用于配置底库数据，`Query` 用于配置验证集。在进行 Eval 时，首先使用模型对 `Gallery` 底库数据进行前向计算特征向量，特征向量用于构建底库，然后模型对 `Query` 验证集中的数据进行前向计算特征向量，再与底库计算召回率等指标。
+    class SimpleNet(paddle.nn.Layer):
+        def __init__(self):
+            pass
+        def forward(self, x):
+            pass

-#### Q2.1.5: PaddlePaddle 安装后，使用报错，无法导入 paddle 下的任何模块（import paddle.xxx），是为什么呢？
-**A**: 首先可以使用以下代码测试 Paddle 是否安装正确：
-```python
-import paddle
-paddle.utils.install_check.run_check(）
-```
-正确安装时，通常会有如下提示：
-```
-PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
-```
-如未能安装成功，则会有相应问题的提示。
-另外，在同时安装CPU版本和GPU版本Paddle后，由于两个版本存在冲突，需要将两个版本全部卸载，然后重新安装所需要的版本。
+    net = SimpleNet()
+    x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
+    paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
+    ```
+    其中：
+    * `InputSpec()` 函数用于描述模型输入的签名信息，包括输入数据的 `shape`、`type` 和 `name`（可省略）；
+    * `paddle.onnx.export()` 函数需要指定模型组网对象 `net`，导出模型的保存路径 `save_path`，模型的输入数据描述 `input_spec`。

-#### Q2.1.6: 使用PaddleClas训练时，如何设置仅保存最优模型？不想保存中间模型。
-**A**: PaddleClas在训练过程中，会保存/更新以下三类模型：
-1. 最新的模型（`latest.pdopt`， `latest.pdparams`，`latest.pdstates`），当训练意外中断时，可使用最新保存的模型恢复训练；
-2. 最优的模型（`best_model.pdopt`，`best_model.pdparams`，`best_model.pdstates`）；
-3. 训练过程中，一个epoch结束时的断点（`epoch_xxx.pdopt`，`epoch_xxx.pdparams`，`epoch_xxx.pdstates`）。训练配置文件中 `Global.save_interval` 字段表示该模型的保存间隔。将该字段设置大于总epochs数，则不再保存中间断点模型。
+    需要注意，`paddlepaddle` 版本需大于 `2.0.0`。关于 `paddle.onnx.export()` 函数的更多参数说明请参考[paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/export_cn.html#export)。
+
+#### Q2.5.4: 在 build 检索底库时，参数 `pq_size` 应该如何设置？
+**A**：`pq_size` 是PQ检索算法的参数。PQ检索算法可以简单理解为“分层”检索算法，`pq_size` 是每层的“容量”，因此该参数的设置会影响检索性能，不过，在底库总数据量不太大（小于10000张）的情况下，这个参数对性能的影响很小，因此对于大多数使用场景而言，在构建底库时无需修改该参数。关于PQ检索算法的更多内容，可以查看相关[论文](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf)。

 <a name="精选"></a>
 ## 精选
@ -204,6 +213,22 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 2. 最优的模型（`best_model.pdopt`，`best_model.pdparams`，`best_model.pdstates`）；
 3. 训练过程中，一个epoch结束时的断点（`epoch_xxx.pdopt`，`epoch_xxx.pdparams`，`epoch_xxx.pdstates`）。训练配置文件中 `Global.save_interval` 字段表示该模型的保存间隔。将该字段设置大于总epochs数，则不再保存中间断点模型。

+#### Q2.1.7: 在训练时，出现如下报错信息：`ERROR: Unexpected segmentation fault encountered in DataLoader workers.`，如何排查解决问题呢？
+**A**：尝试将训练配置文件中的字段 `num_workers` 设置为 `0`；尝试将训练配置文件中的字段 `batch_size` 调小一些；检查数据集格式和配置文件中的数据集路径是否正确。
+
+#### Q2.1.8: 如何在训练时使用 `Mixup` 和 `Cutmix` ？
+**A**：
+* `Mixup` 的使用方法请参考 [Mixup](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Mixup.yaml#L63-L65)；`Cuxmix` 请参考 [Cuxmix](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65)。
+
+* 在使用 `Mixup` 或 `Cutmix` 时，需要注意：
+    * 配置文件中的 `Loss.Tranin.CELoss` 需要修改为 `Loss.Tranin.MixCELoss`，可参考 [MixCELoss](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L23-L26)；
+    * 使用 `Mixup` 或 `Cutmix` 做训练时无法计算训练的精度（Acc）指标，因此需要在配置文件中取消 `Metric.Train.TopkAcc` 字段，可参考 [Metric.Train.TopkAcc](https://github.com/PaddlePaddle/PaddleClas/blob/cf9fc9363877f919996954a63716acfb959619d0/ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128)。
+
+#### Q2.1.9: 训练配置yaml文件中，字段 `Global.pretrain_model` 和 `Global.checkpoints` 分别用于配置什么呢？
+**A**：
+* 当需要 `fine-tune` 时，可以通过字段 `Global.pretrain_model` 配置预训练模型权重文件的路径，预训练模型权重文件后缀名通常为 `.pdparams`；
+* 在训练过程中，训练程序会自动保存每个epoch结束时的断点信息，包括优化器信息 `.pdopt` 和模型权重信息 `.pdparams`。在训练过程意外中断等情况下，需要恢复训练时，可以通过字段 `Global.checkpoints` 配置训练过程中保存的断点信息文件，例如通过配置 `checkpoints: ./output/ResNet18/epoch_18` 即可恢复18epoch训练结束时的断点信息，PaddleClas将自动加载 `epoch_18.pdopt` 和 `epoch_18.pdparams`，从19epoch继续训练。
+
 <a name="2.2图像分类"></a>
 ### 2.2 图像分类

@ -255,6 +280,9 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 #### Q2.5.3: Mac重新编译index.so时报错如下：clang: error: unsupported option '-fopenmp', 该如何处理？
 **A**：该问题已经解决。可以参照[文档](../../../develop/deploy/vector_search/README.md)重新编译 index.so。

+#### Q2.5.4: 在 build 检索底库时，参数 `pq_size` 应该如何设置？
+**A**：`pq_size` 是PQ检索算法的参数。PQ检索算法可以简单理解为“分层”检索算法，`pq_size` 是每层的“容量”，因此该参数的设置会影响检索性能，不过，在底库总数据量不太大（小于10000张）的情况下，这个参数对性能的影响很小，因此对于大多数使用场景而言，在构建底库时无需修改该参数。关于PQ检索算法的更多内容，可以查看相关[论文](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf)。
+
 <a name="2.6模型预测部署"></a>
 ### 2.6 模型预测部署

@ -267,3 +295,48 @@ PaddlePaddle is installed successfully! Let's start deep learning with PaddlePad
 UserWarning: Skip loading for ***. *** is not found in the provided dict.
 ```
 如果存在，则说明模型权重未能加载成功，请进一步检查配置文件中的 `Global.pretrained_model` 字段，是否正确配置了模型权重文件的路径。模型权重文件后缀名通常为 `pdparams`，注意在配置该路径时无需填写文件后缀名。
+
+#### Q2.6.3: 如何将模型转为 `ONNX` 格式？
+**A**：Paddle支持两种转ONNX格式模型的方式，且依赖于 `paddle2onnx` 工具，首先需要安装 `paddle2onnx`：
+
+```shell
+pip install paddle2onnx
+```
+
+* 从 inference model 转为 ONNX 格式模型：
+
+    以动态图导出的 `combined` 格式 inference model（包含 `.pdmodel` 和 `.pdiparams` 两个文件）为例，使用以下命令进行模型格式转换：
+    ```shell
+    paddle2onnx --model_dir ${model_path}  --model_filename  ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
+    ```
+    上述命令中：
+    * `model_dir`：该参数下需要包含 `.pdmodel` 和 `.pdiparams` 两个文件；
+    * `model_filename`：该参数用于指定参数 `model_dir` 下的 `.pdmodel` 文件路径；
+    * `params_filename`：该参数用于指定参数 `model_dir` 下的 `.pdiparams` 文件路径；
+    * `save_file`：该参数用于指定转换后的模型保存目录路径。
+
+    关于静态图导出的非 `combined` 格式的 inference model（通常包含文件 `__model__` 和多个参数文件）转换模型格式，以及更多参数说明请参考 paddle2onnx 官方文档 [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md#%E5%8F%82%E6%95%B0%E9%80%89%E9%A1%B9)。
+
+* 直接从模型组网代码导出ONNX格式模型：
+
+    以动态图模型组网代码为例，模型类为继承于 `paddle.nn.Layer` 的子类，代码如下所示：
+
+    ```python
+    import paddle
+    from paddle.static import InputSpec
+
+    class SimpleNet(paddle.nn.Layer):
+        def __init__(self):
+            pass
+        def forward(self, x):
+            pass
+
+    net = SimpleNet()
+    x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
+    paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
+    ```
+    其中：
+    * `InputSpec()` 函数用于描述模型输入的签名信息，包括输入数据的 `shape`、`type` 和 `name`（可省略）；
+    * `paddle.onnx.export()` 函数需要指定模型组网对象 `net`，导出模型的保存路径 `save_path`，模型的输入数据描述 `input_spec`。
+
+    需要注意，`paddlepaddle` 版本需大于 `2.0.0`。关于 `paddle.onnx.export()` 函数的更多参数说明请参考[paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/export_cn.html#export)。
--- a/docs/zh_CN/models/LCNet.md
+++ b/docs/zh_CN/models/LCNet.md
@ -0,0 +1,41 @@
+# PPLCNet系列
+
+## 概述
+
+PPLCNet系列是百度PaddleCV团队提出的一种在Intel-CPU上表现优异的网络，作者总结了一些在Intel-CPU上可以提升模型精度但几乎不增加推理耗时的方法，将这些方法组合成了一个新的网络，即PPLCNet。与其他轻量级网络相比，PPLCNet可以在相同延时下取得更高的精度。PPLCNet已在图像分类、目标检测、语义分割上表现出了强大的竞争力。
+
+
+
+## 精度、FLOPS和参数量
+
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+
+
+
+## 基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz的预测速度
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/zh_CN/models/PPLCNet.md
+++ b/docs/zh_CN/models/PPLCNet.md
@ -0,0 +1,41 @@
+# PPLCNet系列
+
+## 概述
+
+PPLCNet系列是百度PaddleCV团队提出的一种在Intel-CPU上表现优异的网络，作者总结了一些在Intel-CPU上可以提升模型精度但几乎不增加推理耗时的方法，将这些方法组合成了一个新的网络，即PPLCNet。与其他轻量级网络相比，PPLCNet可以在相同延时下取得更高的精度。PPLCNet已在图像分类、目标检测、语义分割上表现出了强大的竞争力。
+
+
+
+## 精度、FLOPS和参数量
+
+| Models           | Top1 | Top5 | FLOPs<br>(M) | Parameters<br>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| PPLCNet_x0_25        |0.5186           | 0.7565           | 18    | 1.5  |
+| PPLCNet_x0_35        |0.5809           | 0.8083           | 29    | 1.6  |
+| PPLCNet_x0_5         |0.6314           | 0.8466           | 47    | 1.9  |
+| PPLCNet_x0_75        |0.6818           | 0.8830           | 99    | 2.4  |
+| PPLCNet_x1_0         |0.7132           | 0.9003           | 161   | 3.0  |
+| PPLCNet_x1_5         |0.7371           | 0.9153           | 342   | 4.5  |
+| PPLCNet_x2_0         |0.7518           | 0.9227           | 590   | 6.5  |
+| PPLCNet_x2_5         |0.7660           | 0.9300           | 906   | 9.0  |
+| PPLCNet_x0_5_ssld    |0.6610           | 0.8646           | 47    | 1.9  |
+| PPLCNet_x1_0_ssld    |0.7439           | 0.9209           | 161   | 3.0  |
+| PPLCNet_x2_5_ssld    |0.8082           | 0.9533           | 906   | 9.0  |
+
+
+
+## 基于Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz的预测速度
+
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
+|------------------|-----------|-------------------|--------------------------|
+| PPLCNet_x0_25        | 224       | 256               | 1.74                    |
+| PPLCNet_x0_35        | 224       | 256               | 1.92                    |
+| PPLCNet_x0_5         | 224       | 256               | 2.05                    |
+| PPLCNet_x0_75        | 224       | 256               | 2.29                    |
+| PPLCNet_x1_0         | 224       | 256               | 2.46                    |
+| PPLCNet_x1_5         | 224       | 256               | 3.19                    |
+| PPLCNet_x2_0         | 224       | 256               | 4.27                    |
+| PPLCNet_x2_5         | 224       | 256               | 5.39                    |
+| PPLCNet_x0_5_ssld    | 224       | 256               | 2.05                    |
+| PPLCNet_x1_0_ssld    | 224       | 256               | 2.46                    |
+| PPLCNet_x2_5_ssld    | 224       | 256               | 5.39                    |
--- a/docs/zh_CN/tutorials/getting_started_retrieval.md
+++ b/docs/zh_CN/tutorials/getting_started_retrieval.md
@ -117,7 +117,7 @@ python3 tools/train.py \

 其中，`-c`用于指定配置文件的路径，`-o`用于指定需要修改或者添加的参数，其中`-o Arch.Backbone.pretrained=True`表示Backbone部分使用预训练模型，此外，`Arch.Backbone.pretrained`也可以指定具体的模型权重文件的地址，使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练，则需要将`Global.device`设置为`cpu`。

-更详细的训练配置，也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)。
+更详细的训练配置，也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config_description.md)。

 运行上述命令，可以看到输出日志，示例如下：

@ -245,4 +245,4 @@ python3 tools/export_model.py \
  - 平均检索精度(mAP)
  
    - AP: AP指的是不同召回率上的正确率的平均值
-    - mAP: 测试集中所有图片对应的AP的的平均值
+    - mAP: 测试集中所有图片对应的AP的的平均值
--- a/docs/zh_CN/tutorials/quick_start_recognition.md
+++ b/docs/zh_CN/tutorials/quick_start_recognition.md
@ -34,6 +34,8 @@

 检测模型与4个方向(Logo、动漫人物、车辆、商品)的识别inference模型、测试数据下载地址以及对应的配置文件地址如下。

+服务器端通用主体检测模型与各方向识别模型：
+
 | 模型简介       | 推荐场景   | inference模型  | 预测配置文件  | 构建索引库的配置文件 |
 | ------------  | ------------- | -------- | ------- | -------- |
 | 通用主体检测模型 | 通用场景  |[模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | - |
@ -43,6 +45,12 @@
 | 商品识别模型 | 商品场景  |  [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
 | 车辆ReID模型 | 车辆ReID场景 | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | - | - |

+轻量级通用主体检测模型与轻量级通用识别模型：
+
+| 模型简介       | 推荐场景   | inference模型  | 预测配置文件  | 构建索引库的配置文件 |
+| ------------  | ------------- | -------- | ------- | -------- |
+| 轻量级通用主体检测模型 | 通用场景  |[模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | - | - |
+| 轻量级通用识别模型 | 通用场景  | [模型下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |

 本章节demo数据下载地址如下: [数据下载链接](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar)。

@ -50,6 +58,7 @@

 1. windows 环境下如果没有安装wget,可以按照下面的步骤安装wget与tar命令，也可以在，下载模型时将链接复制到浏览器中下载，并解压放置在相应目录下；linux或者macOS用户可以右键点击，然后复制下载链接，即可通过`wget`命令下载。
 2. 如果macOS环境下没有安装`wget`命令，可以运行下面的命令进行安装。
+3. 轻量级通用识别模型的预测配置文件和构建索引的配置文件目前使用的是服务器端商品识别模型的配置，您可以自行修改模型的路径完成相应的索引构建和识别预测。

 ```shell
 # 安装 homebrew
@ -124,6 +133,13 @@ wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognit
 │   └── inference.pdmodel
 ```

+**注意**
+如果使用轻量级通用识别模型，Demo数据需要重新提取特征、够建索引，方式如下：
+
+```shell
+python3.7 python/build_gallery.py -c configs/build_product.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer
+```
+
 <a name="商品识别与检索"></a>
 ### 2.2 商品识别与检索

--- a/docs/zh_CN_tmp/image_recognition_pipeline/mainbody_detection.md
+++ b/docs/zh_CN_tmp/image_recognition_pipeline/mainbody_detection.md
@ -0,0 +1,218 @@
+# 主体检测
+
+
+主体检测技术是目前应用非常广泛的一种检测技术，它指的是检测出图片中一个或者多个主体的坐标位置，然后将图像中的对应区域裁剪下来，进行识别，从而完成整个识别过程。主体检测是识别任务的前序步骤，可以有效提升识别精度。
+
+本部分主要从数据集、模型选择和模型训练 3 个方面对该部分内容进行介绍。
+
+## 1. 数据集
+
+在 PaddleClas 的识别任务中，训练主体检测模型时主要用到了以下几个数据集。
+
+| 数据集       | 数据量   | 主体检测任务中使用的数据量   | 场景  | 数据集地址 |
+| :------------:  | :-------------: | :-------: | :-------: | :--------: |
+| Objects365 | 170W | 6k | 通用场景 | [地址](https://www.objects365.org/overview.html) |
+| COCO2017 | 12W | 5k  | 通用场景 | [地址](https://cocodataset.org/) |
+| iCartoonFace | 2k | 2k | 动漫人脸检测 | [地址](https://github.com/luxiangju-PersonAI/iCartoonFace) |
+| LogoDet-3k | 3k | 2k | Logo检测 | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
+| RPC | 3k | 3k  | 商品检测 | [地址](https://rpc-dataset.github.io/) |
+
+在实际训练的过程中，将所有数据集混合在一起。由于是主体检测，这里将所有标注出的检测框对应的类别都修改为 `前景` 的类别，最终融合的数据集中只包含 1 个类别，即前景。
+
+
+## 2. 模型选择
+
+目标检测方法种类繁多，比较常用的有两阶段检测器（如FasterRCNN系列等）；单阶段检测器（如YOLO、SSD等）；anchor-free检测器（如PicoDet、FCOS等）。PaddleDetection中针对服务端使用场景，自研了 PP-YOLO 系列模型；针对端侧（CPU和移动端等）使用场景，自研了 PicoDet 系列模型，在服务端和端侧均处于业界较为领先的水平。
+
+基于上述研究，PaddleClas 中提供了 2 个通用主体检测模型，为轻量级与服务端主体检测模型，分别适用于端侧场景以及服务端场景。下面的表格中给出了在上述 5 个数据集上的平均 mAP 以及它们的模型大小、预测速度对比信息。
+
+| 模型      | 模型结构   | 预训练模型下载地址   | inference模型下载地址  | mAP | inference模型大小(MB) | 单张图片预测耗时(不包含预处理)(ms) |
+| :------------:  | :-------------: | :------: | :-------: | :--------: | :-------: | :--------: |
+| 轻量级主体检测模型 | PicoDet | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_pretrained.pdparams) | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | 40.1% | 30.1 | 29.8  |
+| 服务端主体检测模型 | PP-YOLOv2 | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams) | [地址](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | 42.5% | 210.5 | 466.6  |
+
+
+* 注意
+  * 速度评测机器的CPU具体信息为：`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`，速度指标为开启 mkldnn ，线程数设置为 10 测试得到。
+  * 主体检测的预处理过程较为耗时，平均每张图在上述机器上的时间在 40~55 ms 左右，没有包含在上述的预测耗时统计中。
+
+
+### 2.1 轻量级主体检测模型
+
+PicoDet 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出，是一个适用于CPU或者移动端场景的目标检测算法。具体地，它融合了下面一系列优化算法。
+
+- [ATSS](https://arxiv.org/abs/1912.02424)
+- [Generalized Focal Loss](https://arxiv.org/abs/2006.04388)
+- 余弦学习率策略
+- Cycle-EMA
+- 轻量级检测 head
+
+
+更多关于 PicoDet 的优化细节与 benchmark 可以参考 [PicoDet 系列模型介绍](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/README.md)。
+
+在轻量级主体检测任务中，为了更好地兼顾检测速度与效果，我们使用 PPLCNet_x2_5 作为主体检测模型的骨干网络，同时将训练与预测的图像尺度修改为了 640x640，其余配置与 [picodet_m_shufflenetv2_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/picodet_m_shufflenetv2_416_coco.yml)完全一致。将数据集更换为自定义的主体检测数据集，进行训练，最终得到检测模型。
+
+
+### 2.2 服务端主体检测模型
+
+PP-YOLO 由 [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) 提出，从骨干网络、数据增广、正则化策略、损失函数、后处理等多个角度对 yolov3 模型进行深度优化，最终在"速度-精度"方面达到了业界领先的水平。具体地，优化的策略如下。
+
+- 更优的骨干网络: ResNet50vd-DCN
+- 更大的训练batch size: 8 GPUs，每GPU batch_size=24，对应调整学习率和迭代轮数
+- [Drop Block](https://arxiv.org/abs/1810.12890)
+- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
+- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
+- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
+- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
+- [CoordConv](https://arxiv.org/abs/1807.03247)
+- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
+- 更优的预训练模型
+
+更多关于 PP-YOLO 的详细介绍可以参考：[PP-YOLO 模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README_cn.md)。
+
+在服务端主体检测任务中，为了保证检测效果，我们使用 ResNet50vd-DCN 作为检测模型的骨干网络，使用配置文件 [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml) ，更换为自定义的主体检测数据集，进行训练，最终得到检测模型。
+
+
+## 3. 模型训练
+
+本节主要介绍怎样基于PaddleDetection，基于自己的数据集，训练主体检测模型。
+
+### 3.1 环境准备
+
+下载PaddleDetection代码，安装requirements。
+
+```shell
+cd <path/to/clone/PaddleDetection>
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection
+# 安装其他依赖
+pip install -r requirements.txt
+```
+
+更多安装教程，请参考: [安装文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL_cn.md)
+
+### 3.2 数据准备
+
+对于自定义数据集，首先需要将自己的数据集修改为COCO格式，可以参考[自定义检测数据集教程](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/static/docs/tutorials/Custom_DataSet.md)制作COCO格式的数据集。
+
+主体检测任务中，所有的检测框均属于前景，在这里需要将标注文件中，检测框的`category_id`修改为1，同时将整个标注文件中的`categories`映射表修改为下面的格式，即整个类别映射表中只包含`前景`类别。
+
+```json
+[{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}]
+```
+
+### 3.3 配置文件改动和说明
+
+我们使用 `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` 配置进行训练，配置文件摘要如下：
+
+<div align='center'>
+  <img src='../../images/det/PaddleDetection_config.png' width='400'/>
+</div>
+
+从上图看到 `ppyolov2_r50vd_dcn_365e_coco.yml` 配置需要依赖其他的配置文件，这些配置文件的含义如下:
+
+```
+coco_detection.yml：主要说明了训练数据和验证数据的路径
+
+runtime.yml：主要说明了公共的运行参数，比如是否使用GPU、每多少个epoch存储checkpoint等
+
+optimizer_365e.yml：主要说明了学习率和优化器的配置
+
+ppyolov2_r50vd_dcn.yml：主要说明模型和主干网络的情况
+
+ppyolov2_reader.yml：主要说明数据读取器配置，如 batch size，并发加载子进程数等，同时包含读取后预处理操作，如resize、数据增强等等
+```
+
+在主体检测任务中，需要将 `datasets/coco_detection.yml` 中的 `num_classes` 参数修改为 1 （只有 1 个前景类别），同时将训练集和测试集的路径修改为自定义数据集的路径。
+
+此外，也可以根据实际情况，修改上述文件，比如，如果显存溢出，可以将 batch size 和学习率等比缩小等。
+
+
+### 3.4 启动训练
+
+PaddleDetection 提供了单卡/多卡训练模式，满足用户多种训练需求。
+
+* GPU 单卡训练
+
+```bash
+# windows和Mac下不需要执行该命令
+export CUDA_VISIBLE_DEVICES=0
+python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml
+```
+
+* GPU多卡训练
+
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval
+```
+
+--eval：表示边训练边验证。
+
+
+* (**推荐**)模型微调
+如果希望加载 PaddleClas 中已经训练好的主体检测模型，在自己的数据集上进行模型微调，可以使用下面的命令进行训练。
+
+```bash
+export CUDA_VISIBLE_DEVICES=0
+# 指定pretrain_weights参数，加载通用的主体检测预训练模型
+python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pretrain_weights=https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams
+```
+
+* 模型恢复训练
+
+在日常训练过程中，有的用户由于一些原因导致训练中断，可以使用 `-r` 的命令恢复训练:
+
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval -r output/ppyolov2_r50vd_dcn_365e_coco/10000
+```
+
+注意：如果遇到 "`Out of memory error`" 问题, 尝试在 `ppyolov2_reader.yml` 文件中调小`batch_size`，同时等比例调小学习率。
+
+
+### 3.5 模型预测与调试
+
+使用下面的命令完成 PaddleDetection 的预测过程。
+
+```bash
+export CUDA_VISIBLE_DEVICES=0
+python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer_img=your_image_path.jpg --output_dir=infer_output/ --draw_threshold=0.5 -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final
+```
+
+`--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算，不同阈值会产生不同的结果 `keep_top_k` 表示设置输出目标的最大数量，默认值为 100 ，用户可以根据自己的实际情况进行设定。
+
+### 3.6 模型导出与预测部署。
+
+执行导出模型脚本：
+
+```bash
+python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --output_dir=./inference -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final.pdparams
+```
+
+预测模型会导出到 `inference/ppyolov2_r50vd_dcn_365e_coco` 目录下，分别为 `infer_cfg.yml` (预测不需要), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel` 。
+
+注意： `PaddleDetection` 导出的inference模型的文件格式为 `model.xxx`，这里如果希望与PaddleClas的inference模型文件格式保持一致，需要将其 `model.xxx` 文件修改为 `inference.xxx` 文件，用于后续主体检测的预测部署。
+
+更多模型导出教程，请参考： [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md)
+
+最终，目录 `inference/ppyolov2_r50vd_dcn_365e_coco` 中包含 `inference.pdiparams`, `inference.pdiparams.info` 以及 `inference.pdmodel` 文件，其中 `inference.pdiparams` 为保存的 inference 模型权重文件， `inference.pdmodel` 为保存的 inference 模型结构文件。
+
+
+导出模型之后，在主体检测与识别任务中，就可以将检测模型的路径更改为该 inference 模型路径，完成预测。
+
+以商品识别为例，其配置文件为 [inference_product.yaml](../../../deploy/configs/inference_product.yaml) ，修改其中的 `Global.det_inference_model_dir` 字段为导出的主体检测 inference 模型目录，参考[图像识别快速开始教程](../tutorials/quick_start_recognition.md) ，即可完成商品检测与识别过程。
+
+
+### FAQ
+
+#### Q：可以使用其他的主体检测模型结构吗？
+
+* A：可以的，但是目前的检测预处理过程仅适配了 PicoDet 以及 YOLO 系列的预处理，因此在使用的时候，建议优先使用这两个系列的模型进行训练，如果希望使用 Faster RCNN 等其他系列的模型，需要按照 PaddleDetection 的数据预处理，修改下预处理逻辑，这块如果您有需求或者有问题的话，欢迎提 issue 或者在微信群里反馈。
+
+#### Q：可以修改主体检测的预测尺度吗？
+
+* A：可以的，但是需要注意 2 个地方
+  * PaddleClas 中提供的主体检测模型是基于 `640x640` 的分辨率去训练的，因此预测的时候也是默认使用 `640x640` 的分辨率进行预测，使用其他分辨率预测的话，精度会有所降低。
+  * 在模型导出的时候，建议也修改下模型导出的分辨率，保持模型导出、模型预测的分辨率一致。
--- a/ppcls/arch/backbone/init.py
+++ b/ppcls/arch/backbone/init.py
@ -21,6 +21,7 @@ from ppcls.arch.backbone.legendary_models.resnet import ResNet18, ResNet18_vd, R
 from ppcls.arch.backbone.legendary_models.vgg import VGG11, VGG13, VGG16, VGG19
 from ppcls.arch.backbone.legendary_models.inception_v3 import InceptionV3
 from ppcls.arch.backbone.legendary_models.hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W64_C
+from ppcls.arch.backbone.legendary_models.pp_lcnet import PPLCNet_x0_25, PPLCNet_x0_35, PPLCNet_x0_5, PPLCNet_x0_75, PPLCNet_x1_0, PPLCNet_x1_5, PPLCNet_x2_0, PPLCNet_x2_5

 from ppcls.arch.backbone.model_zoo.resnet_vc import ResNet50_vc
 from ppcls.arch.backbone.model_zoo.resnext import ResNeXt50_32x4d, ResNeXt50_64x4d, ResNeXt101_32x4d, ResNeXt101_64x4d, ResNeXt152_32x4d, ResNeXt152_64x4d
@ -57,6 +58,7 @@ from ppcls.arch.backbone.model_zoo.dla import DLA34, DLA46_c, DLA46x_c, DLA60, D
 from ppcls.arch.backbone.model_zoo.rednet import RedNet26, RedNet38, RedNet50, RedNet101, RedNet152
 from ppcls.arch.backbone.model_zoo.tnt import TNT_small
 from ppcls.arch.backbone.model_zoo.hardnet import HarDNet68, HarDNet85, HarDNet39_ds, HarDNet68_ds
+from ppcls.arch.backbone.model_zoo.cspnet import CSPDarkNet53
 from ppcls.arch.backbone.variant_models.resnet_variant import ResNet50_last_stage_stride1
 from ppcls.arch.backbone.variant_models.vgg_variant import VGG19Sigmoid

--- a/ppcls/arch/backbone/legendary_models/pp_lcnet.py
+++ b/ppcls/arch/backbone/legendary_models/pp_lcnet.py
@ -0,0 +1,399 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import, division, print_function
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D, BatchNorm, Conv2D, Dropout, Linear
+from paddle.regularizer import L2Decay
+from paddle.nn.initializer import KaimingNormal
+from ppcls.arch.backbone.base.theseus_layer import TheseusLayer
+from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
+
+MODEL_URLS = {
+    "PPLCNet_x0_25":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams",
+    "PPLCNet_x0_35":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams",
+    "PPLCNet_x0_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams",
+    "PPLCNet_x0_75":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams",
+    "PPLCNet_x1_0":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams",
+    "PPLCNet_x1_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams",
+    "PPLCNet_x2_0":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams",
+    "PPLCNet_x2_5":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams"
+}
+
+__all__ = list(MODEL_URLS.keys())
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+    "blocks2":
+    #k, in_c, out_c, s, use_se
+    [[3, 16, 32, 1, False]],
+    "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+    "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+    "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False],
+                [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+                [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+    "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+    if min_value is None:
+        min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+    def __init__(self,
+                 num_channels,
+                 filter_size,
+                 num_filters,
+                 stride,
+                 num_groups=1):
+        super().__init__()
+
+        self.conv = Conv2D(
+            in_channels=num_channels,
+            out_channels=num_filters,
+            kernel_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) // 2,
+            groups=num_groups,
+            weight_attr=ParamAttr(initializer=KaimingNormal()),
+            bias_attr=False)
+
+        self.bn = BatchNorm(
+            num_filters,
+            param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+            bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+        self.hardswish = nn.Hardswish()
+
+    def forward(self, x):
+        x = self.conv(x)
+        x = self.bn(x)
+        x = self.hardswish(x)
+        return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+    def __init__(self,
+                 num_channels,
+                 num_filters,
+                 stride,
+                 dw_size=3,
+                 use_se=False):
+        super().__init__()
+        self.use_se = use_se
+        self.dw_conv = ConvBNLayer(
+            num_channels=num_channels,
+            num_filters=num_channels,
+            filter_size=dw_size,
+            stride=stride,
+            num_groups=num_channels)
+        if use_se:
+            self.se = SEModule(num_channels)
+        self.pw_conv = ConvBNLayer(
+            num_channels=num_channels,
+            filter_size=1,
+            num_filters=num_filters,
+            stride=1)
+
+    def forward(self, x):
+        x = self.dw_conv(x)
+        if self.use_se:
+            x = self.se(x)
+        x = self.pw_conv(x)
+        return x
+
+
+class SEModule(TheseusLayer):
+    def __init__(self, channel, reduction=4):
+        super().__init__()
+        self.avg_pool = AdaptiveAvgPool2D(1)
+        self.conv1 = Conv2D(
+            in_channels=channel,
+            out_channels=channel // reduction,
+            kernel_size=1,
+            stride=1,
+            padding=0)
+        self.relu = nn.ReLU()
+        self.conv2 = Conv2D(
+            in_channels=channel // reduction,
+            out_channels=channel,
+            kernel_size=1,
+            stride=1,
+            padding=0)
+        self.hardsigmoid = nn.Hardsigmoid()
+
+    def forward(self, x):
+        identity = x
+        x = self.avg_pool(x)
+        x = self.conv1(x)
+        x = self.relu(x)
+        x = self.conv2(x)
+        x = self.hardsigmoid(x)
+        x = paddle.multiply(x=identity, y=x)
+        return x
+
+
+class PPLCNet(TheseusLayer):
+    def __init__(self,
+                 scale=1.0,
+                 class_num=1000,
+                 dropout_prob=0.2,
+                 class_expand=1280):
+        super().__init__()
+        self.scale = scale
+        self.class_expand = class_expand
+
+        self.conv1 = ConvBNLayer(
+            num_channels=3,
+            filter_size=3,
+            num_filters=make_divisible(16 * scale),
+            stride=2)
+
+        self.blocks2 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+        ])
+
+        self.blocks3 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+        ])
+
+        self.blocks4 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+        ])
+
+        self.blocks5 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+        ])
+
+        self.blocks6 = nn.Sequential(*[
+            DepthwiseSeparable(
+                num_channels=make_divisible(in_c * scale),
+                num_filters=make_divisible(out_c * scale),
+                dw_size=k,
+                stride=s,
+                use_se=se)
+            for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+        ])
+
+        self.avg_pool = AdaptiveAvgPool2D(1)
+
+        self.last_conv = Conv2D(
+            in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+            out_channels=self.class_expand,
+            kernel_size=1,
+            stride=1,
+            padding=0,
+            bias_attr=False)
+
+        self.hardswish = nn.Hardswish()
+        self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+        self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+        self.fc = Linear(self.class_expand, class_num)
+
+    def forward(self, x):
+        x = self.conv1(x)
+
+        x = self.blocks2(x)
+        x = self.blocks3(x)
+        x = self.blocks4(x)
+        x = self.blocks5(x)
+        x = self.blocks6(x)
+
+        x = self.avg_pool(x)
+        x = self.last_conv(x)
+        x = self.hardswish(x)
+        x = self.dropout(x)
+        x = self.flatten(x)
+        x = self.fc(x)
+        return x
+
+
+def _load_pretrained(pretrained, model, model_url, use_ssld):
+    if pretrained is False:
+        pass
+    elif pretrained is True:
+        load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
+    elif isinstance(pretrained, str):
+        load_dygraph_pretrain(model, pretrained)
+    else:
+        raise RuntimeError(
+            "pretrained type is not available. Please use `string` or `boolean` type."
+        )
+
+
+def PPLCNet_x0_25(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_25
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_25` model depends on args.
+    """
+    model = PPLCNet(scale=0.25, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_25"], use_ssld)
+    return model
+
+
+def PPLCNet_x0_35(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_35
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_35` model depends on args.
+    """
+    model = PPLCNet(scale=0.35, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_35"], use_ssld)
+    return model
+
+
+def PPLCNet_x0_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_5` model depends on args.
+    """
+    model = PPLCNet(scale=0.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_5"], use_ssld)
+    return model
+
+
+def PPLCNet_x0_75(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x0_75
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x0_75` model depends on args.
+    """
+    model = PPLCNet(scale=0.75, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x0_75"], use_ssld)
+    return model
+
+
+def PPLCNet_x1_0(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x1_0
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x1_0` model depends on args.
+    """
+    model = PPLCNet(scale=1.0, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x1_0"], use_ssld)
+    return model
+
+
+def PPLCNet_x1_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x1_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x1_5` model depends on args.
+    """
+    model = PPLCNet(scale=1.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x1_5"], use_ssld)
+    return model
+
+
+def PPLCNet_x2_0(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x2_0
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x2_0` model depends on args.
+    """
+    model = PPLCNet(scale=2.0, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x2_0"], use_ssld)
+    return model
+
+
+def PPLCNet_x2_5(pretrained=False, use_ssld=False, **kwargs):
+    """
+    PPLCNet_x2_5
+    Args:
+        pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+                    If str, means the path of the pretrained model.
+        use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+    Returns:
+        model: nn.Layer. Specific `PPLCNet_x2_5` model depends on args.
+    """
+    model = PPLCNet(scale=2.5, **kwargs)
+    _load_pretrained(pretrained, model, MODEL_URLS["PPLCNet_x2_5"], use_ssld)
+    return model
--- a/ppcls/arch/backbone/model_zoo/cspnet.py
+++ b/ppcls/arch/backbone/model_zoo/cspnet.py
@ -0,0 +1,374 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle import ParamAttr
+
+from ppcls.utils.save_load import load_dygraph_pretrain, load_dygraph_pretrain_from_url
+
+MODEL_URLS = {
+    "CSPDarkNet53":
+    "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/CSPDarkNet53_pretrained.pdparams"
+}
+
+MODEL_CFGS = {
+    "CSPDarkNet53": dict(
+        stem=dict(
+            out_chs=32, kernel_size=3, stride=1, pool=''),
+        stage=dict(
+            out_chs=(64, 128, 256, 512, 1024),
+            depth=(1, 2, 8, 8, 4),
+            stride=(2, ) * 5,
+            exp_ratio=(2., ) + (1., ) * 4,
+            bottle_ratio=(0.5, ) + (1.0, ) * 4,
+            block_ratio=(1., ) + (0.5, ) * 4,
+            down_growth=True, ))
+}
+
+__all__ = ['CSPDarkNet53'
+           ]  # model_registry will add each entrypoint fn to this
+
+
+class ConvBnAct(nn.Layer):
+    def __init__(self,
+                 input_channels,
+                 output_channels,
+                 kernel_size=1,
+                 stride=1,
+                 padding=None,
+                 dilation=1,
+                 groups=1,
+                 act_layer=nn.LeakyReLU,
+                 norm_layer=nn.BatchNorm2D):
+        super().__init__()
+        if padding is None:
+            padding = (kernel_size - 1) // 2
+        self.conv = nn.Conv2D(
+            in_channels=input_channels,
+            out_channels=output_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=padding,
+            dilation=dilation,
+            groups=groups,
+            weight_attr=ParamAttr(),
+            bias_attr=False)
+
+        self.bn = norm_layer(num_features=output_channels)
+        self.act = act_layer()
+
+    def forward(self, inputs):
+        x = self.conv(inputs)
+        x = self.bn(x)
+        if self.act is not None:
+            x = self.act(x)
+        return x
+
+
+def create_stem(in_chans=3,
+                out_chs=32,
+                kernel_size=3,
+                stride=2,
+                pool='',
+                act_layer=None,
+                norm_layer=None):
+    stem = nn.Sequential()
+    if not isinstance(out_chs, (tuple, list)):
+        out_chs = [out_chs]
+    assert len(out_chs)
+    in_c = in_chans
+    for i, out_c in enumerate(out_chs):
+        conv_name = f'conv{i + 1}'
+        stem.add_sublayer(
+            conv_name,
+            ConvBnAct(
+                in_c,
+                out_c,
+                kernel_size,
+                stride=stride if i == 0 else 1,
+                act_layer=act_layer,
+                norm_layer=norm_layer))
+        in_c = out_c
+        last_conv = conv_name
+    if pool:
+        stem.add_sublayer(
+            'pool', nn.MaxPool2D(
+                kernel_size=3, stride=2, padding=1))
+    return stem, dict(
+        num_chs=in_c, reduction=stride, module='.'.join(['stem', last_conv]))
+
+
+class DarkBlock(nn.Layer):
+    def __init__(self,
+                 in_chs,
+                 out_chs,
+                 dilation=1,
+                 bottle_ratio=0.5,
+                 groups=1,
+                 act_layer=nn.ReLU,
+                 norm_layer=nn.BatchNorm2D,
+                 attn_layer=None,
+                 drop_block=None):
+        super(DarkBlock, self).__init__()
+        mid_chs = int(round(out_chs * bottle_ratio))
+        ckwargs = dict(act_layer=act_layer, norm_layer=norm_layer)
+        self.conv1 = ConvBnAct(in_chs, mid_chs, kernel_size=1, **ckwargs)
+        self.conv2 = ConvBnAct(
+            mid_chs,
+            out_chs,
+            kernel_size=3,
+            dilation=dilation,
+            groups=groups,
+            **ckwargs)
+
+    def forward(self, x):
+        shortcut = x
+        x = self.conv1(x)
+        x = self.conv2(x)
+        x = x + shortcut
+        return x
+
+
+class CrossStage(nn.Layer):
+    def __init__(self,
+                 in_chs,
+                 out_chs,
+                 stride,
+                 dilation,
+                 depth,
+                 block_ratio=1.,
+                 bottle_ratio=1.,
+                 exp_ratio=1.,
+                 groups=1,
+                 first_dilation=None,
+                 down_growth=False,
+                 cross_linear=False,
+                 block_dpr=None,
+                 block_fn=DarkBlock,
+                 **block_kwargs):
+        super(CrossStage, self).__init__()
+        first_dilation = first_dilation or dilation
+        down_chs = out_chs if down_growth else in_chs
+        exp_chs = int(round(out_chs * exp_ratio))
+        block_out_chs = int(round(out_chs * block_ratio))
+        conv_kwargs = dict(
+            act_layer=block_kwargs.get('act_layer'),
+            norm_layer=block_kwargs.get('norm_layer'))
+
+        if stride != 1 or first_dilation != dilation:
+            self.conv_down = ConvBnAct(
+                in_chs,
+                down_chs,
+                kernel_size=3,
+                stride=stride,
+                dilation=first_dilation,
+                groups=groups,
+                **conv_kwargs)
+            prev_chs = down_chs
+        else:
+            self.conv_down = None
+            prev_chs = in_chs
+
+        self.conv_exp = ConvBnAct(
+            prev_chs, exp_chs, kernel_size=1, **conv_kwargs)
+        prev_chs = exp_chs // 2  # output of conv_exp is always split in two
+
+        self.blocks = nn.Sequential()
+        for i in range(depth):
+            self.blocks.add_sublayer(
+                str(i),
+                block_fn(prev_chs, block_out_chs, dilation, bottle_ratio,
+                         groups, **block_kwargs))
+            prev_chs = block_out_chs
+
+        # transition convs
+        self.conv_transition_b = ConvBnAct(
+            prev_chs, exp_chs // 2, kernel_size=1, **conv_kwargs)
+        self.conv_transition = ConvBnAct(
+            exp_chs, out_chs, kernel_size=1, **conv_kwargs)
+
+    def forward(self, x):
+        if self.conv_down is not None:
+            x = self.conv_down(x)
+        x = self.conv_exp(x)
+        split = x.shape[1] // 2
+        xs, xb = x[:, :split], x[:, split:]
+        xb = self.blocks(xb)
+        xb = self.conv_transition_b(xb)
+        out = self.conv_transition(paddle.concat([xs, xb], axis=1))
+        return out
+
+
+class DarkStage(nn.Layer):
+    def __init__(self,
+                 in_chs,
+                 out_chs,
+                 stride,
+                 dilation,
+                 depth,
+                 block_ratio=1.,
+                 bottle_ratio=1.,
+                 groups=1,
+                 first_dilation=None,
+                 block_fn=DarkBlock,
+                 block_dpr=None,
+                 **block_kwargs):
+        super().__init__()
+        first_dilation = first_dilation or dilation
+
+        self.conv_down = ConvBnAct(
+            in_chs,
+            out_chs,
+            kernel_size=3,
+            stride=stride,
+            dilation=first_dilation,
+            groups=groups,
+            act_layer=block_kwargs.get('act_layer'),
+            norm_layer=block_kwargs.get('norm_layer'))
+
+        prev_chs = out_chs
+        block_out_chs = int(round(out_chs * block_ratio))
+        self.blocks = nn.Sequential()
+        for i in range(depth):
+            self.blocks.add_sublayer(
+                str(i),
+                block_fn(prev_chs, block_out_chs, dilation, bottle_ratio,
+                         groups, **block_kwargs))
+            prev_chs = block_out_chs
+
+    def forward(self, x):
+        x = self.conv_down(x)
+        x = self.blocks(x)
+        return x
+
+
+def _cfg_to_stage_args(cfg, curr_stride=2, output_stride=32):
+    # get per stage args for stage and containing blocks, calculate strides to meet target output_stride
+    num_stages = len(cfg['depth'])
+    if 'groups' not in cfg:
+        cfg['groups'] = (1, ) * num_stages
+    if 'down_growth' in cfg and not isinstance(cfg['down_growth'],
+                                               (list, tuple)):
+        cfg['down_growth'] = (cfg['down_growth'], ) * num_stages
+    stage_strides = []
+    stage_dilations = []
+    stage_first_dilations = []
+    dilation = 1
+    for cfg_stride in cfg['stride']:
+        stage_first_dilations.append(dilation)
+        if curr_stride >= output_stride:
+            dilation *= cfg_stride
+            stride = 1
+        else:
+            stride = cfg_stride
+            curr_stride *= stride
+        stage_strides.append(stride)
+        stage_dilations.append(dilation)
+    cfg['stride'] = stage_strides
+    cfg['dilation'] = stage_dilations
+    cfg['first_dilation'] = stage_first_dilations
+    stage_args = [
+        dict(zip(cfg.keys(), values)) for values in zip(*cfg.values())
+    ]
+    return stage_args
+
+
+class CSPNet(nn.Layer):
+    def __init__(self,
+                 cfg,
+                 in_chans=3,
+                 class_num=1000,
+                 output_stride=32,
+                 global_pool='avg',
+                 drop_rate=0.,
+                 act_layer=nn.LeakyReLU,
+                 norm_layer=nn.BatchNorm2D,
+                 zero_init_last_bn=True,
+                 stage_fn=CrossStage,
+                 block_fn=DarkBlock):
+        super().__init__()
+        self.class_num = class_num
+        self.drop_rate = drop_rate
+        assert output_stride in (8, 16, 32)
+        layer_args = dict(act_layer=act_layer, norm_layer=norm_layer)
+
+        # Construct the stem
+        self.stem, stem_feat_info = create_stem(in_chans, **cfg['stem'],
+                                                **layer_args)
+        self.feature_info = [stem_feat_info]
+        prev_chs = stem_feat_info['num_chs']
+        curr_stride = stem_feat_info[
+            'reduction']  # reduction does not include pool
+        if cfg['stem']['pool']:
+            curr_stride *= 2
+
+        # Construct the stages
+        per_stage_args = _cfg_to_stage_args(
+            cfg['stage'], curr_stride=curr_stride, output_stride=output_stride)
+        self.stages = nn.LayerList()
+        for i, sa in enumerate(per_stage_args):
+            self.stages.add_sublayer(
+                str(i),
+                stage_fn(
+                    prev_chs, **sa, **layer_args, block_fn=block_fn))
+            prev_chs = sa['out_chs']
+            curr_stride *= sa['stride']
+            self.feature_info += [
+                dict(
+                    num_chs=prev_chs,
+                    reduction=curr_stride,
+                    module=f'stages.{i}')
+            ]
+
+        # Construct the head
+        self.num_features = prev_chs
+
+        self.pool = nn.AdaptiveAvgPool2D(1)
+        self.flatten = nn.Flatten(1)
+        self.fc = nn.Linear(
+            prev_chs,
+            class_num,
+            weight_attr=ParamAttr(),
+            bias_attr=ParamAttr())
+
+    def forward(self, x):
+        x = self.stem(x)
+        for stage in self.stages:
+            x = stage(x)
+        x = self.pool(x)
+        x = self.flatten(x)
+        x = self.fc(x)
+        return x
+
+
+def _load_pretrained(pretrained, model, model_url, use_ssld=False):
+    if pretrained is False:
+        pass
+    elif pretrained is True:
+        load_dygraph_pretrain_from_url(model, model_url, use_ssld=use_ssld)
+    elif isinstance(pretrained, str):
+        load_dygraph_pretrain(model, pretrained)
+    else:
+        raise RuntimeError(
+            "pretrained type is not available. Please use `string` or `boolean` type."
+        )
+
+
+def CSPDarkNet53(pretrained=False, use_ssld=False, **kwargs):
+    model = CSPNet(MODEL_CFGS["CSPDarkNet53"], block_fn=DarkBlock, **kwargs)
+    _load_pretrained(
+        pretrained, model, MODEL_URLS["CSPDarkNet53"], use_ssld=use_ssld)
+    return model
--- a/ppcls/arch/backbone/model_zoo/googlenet.py
+++ b/ppcls/arch/backbone/model_zoo/googlenet.py
@ -131,7 +131,7 @@ class GoogLeNetDY(nn.Layer):
        self._ince5b = Inception(
            832, 832, 384, 192, 384, 48, 128, 128, name="ince5b")

-        self._pool_5 = AvgPool2D(kernel_size=7, stride=7)
+        self._pool_5 = AdaptiveAvgPool2D(1)

        self._drop = Dropout(p=0.4, mode="downscale_in_infer")
        self._fc_out = Linear(
--- a/ppcls/arch/gears/arcmargin.py
+++ b/ppcls/arch/gears/arcmargin.py
@ -24,30 +24,25 @@ class ArcMargin(nn.Layer):
                 margin=0.5,
                 scale=80.0,
                 easy_margin=False):
-        super(ArcMargin, self).__init__()
+        super().__init__()
        self.embedding_size = embedding_size
        self.class_num = class_num
        self.margin = margin
        self.scale = scale
        self.easy_margin = easy_margin
-
-        weight_attr = paddle.ParamAttr(
-            initializer=paddle.nn.initializer.XavierNormal())
-        self.fc = nn.Linear(
-            self.embedding_size,
-            self.class_num,
-            weight_attr=weight_attr,
-            bias_attr=False)
+        self.weight = self.create_parameter(
+            shape=[self.embedding_size, self.class_num],
+            is_bias=False,
+            default_initializer=paddle.nn.initializer.XavierNormal())

    def forward(self, input, label=None):
        input_norm = paddle.sqrt(
            paddle.sum(paddle.square(input), axis=1, keepdim=True))
        input = paddle.divide(input, input_norm)

-        weight = self.fc.weight
        weight_norm = paddle.sqrt(
-            paddle.sum(paddle.square(weight), axis=0, keepdim=True))
-        weight = paddle.divide(weight, weight_norm)
+            paddle.sum(paddle.square(self.weight), axis=0, keepdim=True))
+        weight = paddle.divide(self.weight, weight_norm)

        cos = paddle.matmul(input, weight)
        if not self.training or label is None:
--- a/ppcls/arch/gears/circlemargin.py
+++ b/ppcls/arch/gears/circlemargin.py
@ -26,20 +26,19 @@ class CircleMargin(nn.Layer):
        self.embedding_size = embedding_size
        self.class_num = class_num

-        weight_attr = paddle.ParamAttr(
-            initializer=paddle.nn.initializer.XavierNormal())
-        self.fc = paddle.nn.Linear(
-            self.embedding_size, self.class_num, weight_attr=weight_attr)
+        self.weight = self.create_parameter(
+            shape=[self.embedding_size, self.class_num],
+            is_bias=False,
+            default_initializer=paddle.nn.initializer.XavierNormal())

    def forward(self, input, label):
        feat_norm = paddle.sqrt(
            paddle.sum(paddle.square(input), axis=1, keepdim=True))
        input = paddle.divide(input, feat_norm)

-        weight = self.fc.weight
        weight_norm = paddle.sqrt(
-            paddle.sum(paddle.square(weight), axis=0, keepdim=True))
-        weight = paddle.divide(weight, weight_norm)
+            paddle.sum(paddle.square(self.weight), axis=0, keepdim=True))
+        weight = paddle.divide(self.weight, weight_norm)

        logits = paddle.matmul(input, weight)
        if not self.training or label is None:
@ -49,9 +48,9 @@ class CircleMargin(nn.Layer):
        alpha_n = paddle.clip(logits.detach() + self.margin, min=0.)
        delta_p = 1 - self.margin
        delta_n = self.margin
-        
+
        m_hot = F.one_hot(label.reshape([-1]), num_classes=logits.shape[1])
-        
+
        logits_p = alpha_p * (logits - delta_p)
        logits_n = alpha_n * (logits - delta_n)
        pre_logits = logits_p * m_hot + logits_n * (1 - m_hot)
--- a/ppcls/arch/gears/cosmargin.py
+++ b/ppcls/arch/gears/cosmargin.py
@ -25,13 +25,10 @@ class CosMargin(paddle.nn.Layer):
        self.embedding_size = embedding_size
        self.class_num = class_num

-        weight_attr = paddle.ParamAttr(
-            initializer=paddle.nn.initializer.XavierNormal())
-        self.fc = nn.Linear(
-            self.embedding_size,
-            self.class_num,
-            weight_attr=weight_attr,
-            bias_attr=False)
+        self.weight = self.create_parameter(
+            shape=[self.embedding_size, self.class_num],
+            is_bias=False,
+            default_initializer=paddle.nn.initializer.XavierNormal())

    def forward(self, input, label):
        label.stop_gradient = True
@ -40,15 +37,14 @@ class CosMargin(paddle.nn.Layer):
            paddle.sum(paddle.square(input), axis=1, keepdim=True))
        input = paddle.divide(input, input_norm)

-        weight = self.fc.weight
        weight_norm = paddle.sqrt(
-            paddle.sum(paddle.square(weight), axis=0, keepdim=True))
-        weight = paddle.divide(weight, weight_norm)
+            paddle.sum(paddle.square(self.weight), axis=0, keepdim=True))
+        weight = paddle.divide(self.weight, weight_norm)

        cos = paddle.matmul(input, weight)
        if not self.training or label is None:
            return cos
-        
+
        cos_m = cos - self.margin

        one_hot = paddle.nn.functional.one_hot(label, self.class_num)
--- a/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+++ b/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
@ -0,0 +1,148 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 100
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  eval_mode: retrieval
+  use_dali: False
+  to_static: False
+
+# model architecture
+Arch:
+  name: RecModel
+  infer_output_key: features
+  infer_add_softmax: False
+
+  Backbone: 
+    name: PPLCNet_x2_5
+    pretrained: True
+    use_ssld: True
+  BackboneStopLayer:
+    name: flatten_0
+  Neck:
+    name: FC
+    embedding_size: 1280
+    class_num: 512
+  Head:
+    name: ArcMargin 
+    embedding_size: 512
+    class_num: 185341
+    margin: 0.2
+    scale: 30
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.04
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/train_reg_all_data.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    Query:
+      dataset: 
+        name: VeriWild
+        image_root: ./dataset/Aliproduct/
+        cls_label_path: ./dataset/Aliproduct/val_list.txt
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 4
+        use_shared_memory: True
+
+    Gallery:
+      dataset: 
+        name: VeriWild
+        image_root: ./dataset/Aliproduct/
+        cls_label_path: ./dataset/Aliproduct/val_list.txt
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 4
+        use_shared_memory: True
+
+Metric:
+  Eval:
+    - Recallk:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml
+++ b/ppcls/configs/ImageNet/AlexNet/AlexNet.yaml
@ -34,9 +34,8 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Piecewise
-    learning_rate: 0.01
    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
+    values: [0.01, 0.001, 0.0001, 0.00001]
  regularizer:
    name: 'L2'
    coeff: 0.0001
--- a/ppcls/configs/ImageNet/CSPNet/CSPDarkNet53.yaml
+++ b/ppcls/configs/ImageNet/CSPNet/CSPDarkNet53.yaml
@ -0,0 +1,131 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 120
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  # training model under @to_static
+  to_static: False
+
+# model architecture
+Arch:
+  name: CSPDarkNet53
+  class_num: 1000
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Piecewise
+    decay_epochs: [30, 60, 90]
+    values: [0.1, 0.01, 0.001, 0.0001]
+  regularizer:
+    name: 'L2'
+    coeff: 0.0001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 256
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 288
+        - CropImage:
+            size: 256
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 288
+    - CropImage:
+        size: 256
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_384.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_distilled_patch16_384.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@ -54,18 +56,39 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - RandCropImage:
-            size: 384
+            size: 384 
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@ -82,7 +105,9 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            resize_short: 426
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 384
        - NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@ -107,7 +132,9 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        resize_short: 426
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 384
    - NormalizeImage:
@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_384.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_base_patch16_384.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@ -54,18 +56,39 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - RandCropImage:
-            size: 384
+            size: 384 
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@ -82,7 +105,9 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            resize_short: 426
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 384
        - NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@ -107,7 +132,9 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        resize_short: 426
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 384
    - NormalizeImage:
@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_small_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_small_distilled_patch16_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_small_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_small_patch16_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_tiny_distilled_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_tiny_distilled_patch16_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/DeiT/DeiT_tiny_patch16_224.yaml
+++ b/ppcls/configs/ImageNet/DeiT/DeiT_tiny_patch16_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -22,25 +22,27 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0

-
 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token pos_embed dist_token
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
-
+    name: Cosine
+    learning_rate: 1e-3
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6

 # data loader for train and eval
 DataLoader:
@ -55,17 +57,38 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
-
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: True
    loader:
@ -83,6 +106,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -92,7 +117,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 256
      drop_last: False
      shuffle: False
    loader:
@ -108,6 +133,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -122,9 +149,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_25.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_25.yaml
@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_25
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_35.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_35.yaml
@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_35
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_5.yaml
@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_5
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_75.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x0_75.yaml
@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x0_75
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_0.yaml
@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x1_0
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00003
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x1_5.yaml
@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x1_5
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_0.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_0.yaml
@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x2_0
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml
+++ b/ppcls/configs/ImageNet/PPLCNet/PPLCNet_x2_5.yaml
@ -0,0 +1,130 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  class_num: 1000
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 360
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+# model architecture
+Arch:
+  name: PPLCNet_x2_5
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+        epsilon: 0.1
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.8
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - AutoAugment:
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 512
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: ImageNetDataset
+      image_root: ./dataset/ILSVRC2012/
+      cls_label_path: ./dataset/ILSVRC2012/val_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: docs/images/whl/demo.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 5
+    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt
+
+Metric:
+  Train:
+    - TopkAcc:
+        topk: [1, 5]
+  Eval:
+    - TopkAcc:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window12_384.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window12_384.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 384
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384 
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -84,7 +110,11 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            size: [384, 384]
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
@ -92,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -107,7 +137,11 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        size: [384, 384]
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
+    - CropImage:
+        size: 384
    - NormalizeImage:
        scale: 1.0/255.0
        mean: [0.485, 0.456, 0.406]
@ -120,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window7_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window12_384.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window12_384.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 384
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 384 
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -84,7 +110,11 @@ DataLoader:
            to_rgb: True
            channel_first: False
        - ResizeImage:
-            size: [384, 384]
+            resize_short: 438
+            interpolation: bicubic
+            backend: pil
+        - CropImage:
+            size: 384
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
@ -92,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -107,7 +137,11 @@ Infer:
        to_rgb: True
        channel_first: False
    - ResizeImage:
-        size: [384, 384]
+        resize_short: 438
+        interpolation: bicubic
+        backend: pil
+    - CropImage:
+        size: 384
    - NormalizeImage:
        scale: 1.0/255.0
        mean: [0.485, 0.456, 0.406]
@ -120,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window7_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_small_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_small_patch4_window7_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -24,24 +24,28 @@ Arch:
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: absolute_pos_embed relative_position_bias_table .bias norm 
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 20
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +61,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +122,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +154,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_base.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -20,28 +20,34 @@ Global:
 Arch:
  name: alt_gvt_base
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.3
 
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_large.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -20,28 +20,34 @@ Global:
 Arch:
  name: alt_gvt_large
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.5
 
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml
+++ b/ppcls/configs/ImageNet/Twins/alt_gvt_small.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -20,28 +20,34 @@ Global:
 Arch:
  name: alt_gvt_small
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.2
 
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_base.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -20,28 +20,34 @@ Global:
 Arch:
  name: pcpvt_base
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.3
 
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_large.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -20,28 +20,34 @@ Global:
 Arch:
  name: pcpvt_large
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.5
 
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml
+++ b/ppcls/configs/ImageNet/Twins/pcpvt_small.yaml
@ -7,7 +7,7 @@ Global:
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 120
+  epochs: 300
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@ -20,28 +20,34 @@ Global:
 Arch:
  name: pcpvt_small
  class_num: 1000
+  drop_rate: 0.0
+  drop_path_rate: 0.2
 
 # loss function config for traing/eval process
 Loss:
  Train:
-    - CELoss:
+    - MixCELoss:
        weight: 1.0
+        epsilon: 0.1
  Eval:
    - CELoss:
        weight: 1.0


 Optimizer:
-  name: Momentum
-  momentum: 0.9
+  name: AdamW
+  beta1: 0.9
+  beta2: 0.999
+  epsilon: 1e-8
+  weight_decay: 0.05
+  no_weight_decay_name: norm cls_token proj.0.weight proj.1.weight proj.2.weight proj.3.weight pos_block
+  one_dim_param_no_weight_decay: True
  lr:
-    name: Piecewise
-    learning_rate: 0.1
-    decay_epochs: [30, 60, 90]
-    values: [0.1, 0.01, 0.001, 0.0001]
-  regularizer:
-    name: 'L2'
-    coeff: 0.0001
+    name: Cosine
+    learning_rate: 5e-4
+    eta_min: 1e-5
+    warmup_epoch: 5
+    warmup_start_lr: 1e-6


 # data loader for train and eval
@ -57,17 +63,39 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
+        - TimmAutoAugment:
+            config_str: rand-m9-mstd0.5-inc1
+            interpolation: bicubic
+            img_size: 224
        - NormalizeImage:
            scale: 1.0/255.0
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]
            order: ''
+        - RandomErasing:
+            EPSILON: 0.25
+            sl: 0.02
+            sh: 1.0/3.0
+            r1: 0.3
+            attempt: 10
+            use_log_aspect: True
+            mode: pixel
+      batch_transform_ops:
+        - OpSampler:
+            MixupOperator:
+              alpha: 0.8
+              prob: 0.5
+            CutmixOperator:
+              alpha: 1.0
+              prob: 0.5

    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: True
    loader:
@ -85,6 +113,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@ -94,7 +124,7 @@ DataLoader:
            order: ''
    sampler:
      name: DistributedBatchSampler
-      batch_size: 64
+      batch_size: 128
      drop_last: False
      shuffle: False
    loader:
@ -110,6 +140,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:
@ -124,9 +156,6 @@ Infer:
    class_id_map_file: ppcls/utils/imagenet1k_label_list.txt

 Metric:
-  Train:
-    - TopkAcc:
-        topk: [1, 5]
  Eval:
    - TopkAcc:
        topk: [1, 5]
--- a/ppcls/configs/Logo/ResNet50_ReID.yaml
+++ b/ppcls/configs/Logo/ResNet50_ReID.yaml
@ -54,7 +54,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Cosine
-    learning_rate: 0.01
+    learning_rate: 0.04
  regularizer:
    name: 'L2'
    coeff: 0.0001
@ -84,10 +84,10 @@ DataLoader:
          - RandomErasing:
              EPSILON: 0.5
    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
-        drop_last: False
+        sample_per_id: 2
+        drop_last: True

    loader:
        num_workers: 6
@ -97,7 +97,7 @@ DataLoader:
      dataset:
        name: LogoDataset
        image_root: "dataset/LogoDet-3K-crop/val/"
-        cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+query.txt"
+        cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+val.txt"
        transform_ops:
          - DecodeImage:
              to_rgb: True
@ -122,7 +122,7 @@ DataLoader:
      dataset:
          name: LogoDataset
          image_root: "dataset/LogoDet-3K-crop/train/"
-          cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+gallery.txt"
+          cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+train.txt"
          transform_ops:
            - DecodeImage:
                to_rgb: True
--- a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
+++ b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
@ -54,7 +54,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: MultiStepDecay
-    learning_rate: 0.01
+    learning_rate: 0.04
    milestones: [30, 60, 70, 80, 90, 100]
    gamma: 0.5
    verbose: False
@ -90,10 +90,10 @@ DataLoader:
            r1: 0.3
            mean: [0., 0., 0.]
    sampler:
-      name: DistributedRandomIdentitySampler
+      name: PKSampler
      batch_size: 64
-      num_instances: 2
-      drop_last: False
+      sample_per_id: 2
+      drop_last: True
      shuffle: True
    loader:
      num_workers: 4
--- a/ppcls/configs/Vehicle/ResNet50.yaml
+++ b/ppcls/configs/Vehicle/ResNet50.yaml
@ -51,12 +51,8 @@ Optimizer:
  name: Momentum
  momentum: 0.9
  lr:
-    name: MultiStepDecay
+    name: Cosine
    learning_rate: 0.01
-    milestones: [30, 60, 70, 80, 90, 100, 120, 140]
-    gamma: 0.5
-    verbose: False
-    last_epoch: -1
  regularizer:
    name: 'L2'
    coeff: 0.0005
@ -91,9 +87,9 @@ DataLoader:
              mean: [0., 0., 0.]

    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
+        sample_per_id: 2
        drop_last: False
        shuffle: True
    loader:
--- a/ppcls/configs/Vehicle/ResNet50_ReID.yaml
+++ b/ppcls/configs/Vehicle/ResNet50_ReID.yaml
@ -53,8 +53,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Cosine
-    learning_rate: 0.01
-    last_epoch: -1
+    learning_rate: 0.04
  regularizer:
    name: 'L2'
    coeff: 0.0005
@ -89,10 +88,10 @@ DataLoader:
              mean: [0., 0., 0.]

    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
-        drop_last: False
+        sample_per_id: 2
+        drop_last: True
        shuffle: True
    loader:
        num_workers: 6
--- a/ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
+++ b/ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
@ -0,0 +1,129 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 10
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  use_multilabel: True
+# model architecture
+Arch:
+  name: MobileNetV1
+  class_num: 33
+  pretrained: True
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - MultiLabelLoss:
+        weight: 1.0
+  Eval:
+    - MultiLabelLoss:
+        weight: 1.0
+
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.1
+  regularizer:
+    name: 'L2'
+    coeff: 0.00004
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: MultiLabelDataset
+      image_root: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/
+      cls_label_path: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/multilabel_train_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 64
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    dataset: 
+      name: MultiLabelDataset
+      image_root: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/images/
+      cls_label_path: ./dataset/NUS-WIDE-SCENE/NUS-SCENE-dataset/multilabel_test_list.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: False
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+Infer:
+  infer_imgs: ./deploy/images/0517_2715693311.jpg
+  batch_size: 10
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        resize_short: 256
+    - CropImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: MultiLabelTopk
+    topk: 5
+    class_id_map_file: None
+
+Metric:
+  Train:
+    - HammingDistance:
+    - AccuracyScore:
+  Eval:
+    - HammingDistance:
+    - AccuracyScore:
--- a/ppcls/configs/slim/ResNet50_vehicle_cls_prune.yaml
+++ b/ppcls/configs/slim/ResNet50_vehicle_cls_prune.yaml
@ -0,0 +1,135 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.2_pretrained.pdparams"
+  output_dir: "./output_vehicle_cls_prune/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 160
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: "./inference"
+
+Slim:
+  prune:
+    name: fpgm
+    pruned_ratio: 0.3
+
+# model architecture
+Arch:
+  name: "RecModel"
+  infer_output_key: "features"
+  infer_add_softmax: False
+  Backbone: 
+    name: "ResNet50_last_stage_stride1"
+    pretrained: True
+  BackboneStopLayer:
+    name: "adaptive_avg_pool2d_0"
+  Neck:
+    name: "VehicleNeck"
+    in_channels: 2048
+    out_channels: 512
+  Head:
+    name: "ArcMargin"  
+    embedding_size: 512
+    class_num: 431
+    margin: 0.15
+    scale: 32
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+    - SupConLoss:
+        weight: 1.0
+        views: 2
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.01
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+        name: "CompCars"
+        image_root: "./dataset/CompCars/image/"
+        label_root: "./dataset/CompCars/label/"
+        bbox_crop: True
+        cls_label_path: "./dataset/CompCars/train_test_split/classification/train_label.txt"
+        transform_ops:
+          - ResizeImage:
+              size: 224
+          - RandFlipImage:
+              flip_code: 1
+          - AugMix:
+              prob: 0.5
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+          - RandomErasing:
+              EPSILON: 0.5
+              sl: 0.02
+              sh: 0.4
+              r1: 0.3
+              mean: [0., 0., 0.]
+
+    sampler:
+        name: PKSampler
+        batch_size: 128
+        sample_per_id: 2
+        drop_last: False
+        shuffle: True
+    loader:
+        num_workers: 8
+        use_shared_memory: True
+
+  Eval:
+    dataset: 
+        name: "CompCars"
+        image_root: "./dataset/CompCars/image/"
+        label_root: "./dataset/CompCars/label/"
+        cls_label_path: "./dataset/CompCars/train_test_split/classification/test_label.txt"
+        bbox_crop: True
+        transform_ops:
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+    sampler:
+        name: DistributedBatchSampler
+        batch_size: 128
+        drop_last: False
+        shuffle: False
+    loader:
+        num_workers: 8
+        use_shared_memory: True
+
+Metric:
+    Train:
+    - TopkAcc:
+        topk: [1, 5]
+    Eval:
+    - TopkAcc:
+        topk: [1, 5]
+
--- a/ppcls/configs/slim/ResNet50_vehicle_cls_quantization.yaml
+++ b/ppcls/configs/slim/ResNet50_vehicle_cls_quantization.yaml
@ -0,0 +1,134 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_cls_ResNet50_CompCars_v1.2_pretrained.pdparams"
+  output_dir: "./output_vehicle_cls_pact/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 80
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: "./inference"
+
+Slim:
+  quant:
+    name: pact
+
+# model architecture
+Arch:
+  name: "RecModel"
+  infer_output_key: "features"
+  infer_add_softmax: False
+  Backbone: 
+    name: "ResNet50_last_stage_stride1"
+    pretrained: True
+  BackboneStopLayer:
+    name: "adaptive_avg_pool2d_0"
+  Neck:
+    name: "VehicleNeck"
+    in_channels: 2048
+    out_channels: 512
+  Head:
+    name: "ArcMargin"  
+    embedding_size: 512
+    class_num: 431
+    margin: 0.15
+    scale: 32
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+    - SupConLoss:
+        weight: 1.0
+        views: 2
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.001
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+        name: "CompCars"
+        image_root: "./dataset/CompCars/image/"
+        label_root: "./dataset/CompCars/label/"
+        bbox_crop: True
+        cls_label_path: "./dataset/CompCars/train_test_split/classification/train_label.txt"
+        transform_ops:
+          - ResizeImage:
+              size: 224
+          - RandFlipImage:
+              flip_code: 1
+          - AugMix:
+              prob: 0.5
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+          - RandomErasing:
+              EPSILON: 0.5
+              sl: 0.02
+              sh: 0.4
+              r1: 0.3
+              mean: [0., 0., 0.]
+
+    sampler:
+        name: PKSampler
+        batch_size: 64
+        sample_per_id: 2
+        drop_last: False
+        shuffle: True
+    loader:
+        num_workers: 8
+        use_shared_memory: True
+
+  Eval:
+    dataset: 
+        name: "CompCars"
+        image_root: "./dataset/CompCars/image/"
+        label_root: "./dataset/CompCars/label/"
+        cls_label_path: "./dataset/CompCars/train_test_split/classification/test_label.txt"
+        bbox_crop: True
+        transform_ops:
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+    sampler:
+        name: DistributedBatchSampler
+        batch_size: 128
+        drop_last: False
+        shuffle: False
+    loader:
+        num_workers: 8
+        use_shared_memory: True
+
+Metric:
+    Train:
+    - TopkAcc:
+        topk: [1, 5]
+    Eval:
+    - TopkAcc:
+        topk: [1, 5]
+
--- a/ppcls/configs/slim/ResNet50_vehicle_reid_prune.yaml
+++ b/ppcls/configs/slim/ResNet50_vehicle_reid_prune.yaml
@ -1,8 +1,8 @@
 # global configs
 Global:
  checkpoints: null
-  pretrained_model: null
-  output_dir: "./output/"
+  pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.1_pretrained.pdparams"
+  output_dir: "./output_vehicle_reid_prune/"
  device: "gpu"
  save_interval: 1
  eval_during_train: True
@ -61,7 +61,6 @@ Optimizer:
  lr:
    name: Cosine
    learning_rate: 0.01
-    last_epoch: -1
  regularizer:
    name: 'L2'
    coeff: 0.0005
@ -96,9 +95,9 @@ DataLoader:
              mean: [0., 0., 0.]

    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
+        sample_per_id: 2
        drop_last: False
        shuffle: True
    loader:
--- a/ppcls/configs/slim/ResNet50_vehicle_reid_quantization.yaml
+++ b/ppcls/configs/slim/ResNet50_vehicle_reid_quantization.yaml
@ -0,0 +1,161 @@
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/vehicle_reid_ResNet50_VERIWild_v1.1_pretrained.pdparams"
+  output_dir: "./output_vehicle_reid_pact/"
+  device: "gpu"
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 40
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: "./inference"
+  eval_mode: "retrieval"
+
+# for quantizaiton or prune model
+Slim:
+  ## for prune
+  quant:
+    name: pact
+
+# model architecture
+Arch:
+  name: "RecModel"
+  infer_output_key: "features"
+  infer_add_softmax: False
+  Backbone: 
+    name: "ResNet50_last_stage_stride1"
+    pretrained: True
+  BackboneStopLayer:
+    name: "adaptive_avg_pool2d_0"
+  Neck:
+    name: "VehicleNeck"
+    in_channels: 2048
+    out_channels: 512
+  Head:
+    name: "ArcMargin"  
+    embedding_size: 512
+    class_num: 30671
+    margin: 0.15
+    scale: 32
+ 
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+    - SupConLoss:
+        weight: 1.0
+        views: 2
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.001
+  regularizer:
+    name: 'L2'
+    coeff: 0.0005
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+        name: "VeriWild"
+        image_root: "./dataset/VeRI-Wild/images/"
+        cls_label_path: "./dataset/VeRI-Wild/train_test_split/train_list_start0.txt"
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - RandFlipImage:
+              flip_code: 1
+          - AugMix:
+              prob: 0.5
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+          - RandomErasing:
+              EPSILON: 0.5
+              sl: 0.02
+              sh: 0.4
+              r1: 0.3
+              mean: [0., 0., 0.]
+
+    sampler:
+        name: PKSampler
+        batch_size: 64
+        sample_per_id: 2
+        drop_last: False
+        shuffle: True
+    loader:
+        num_workers: 6
+        use_shared_memory: True
+  Eval:
+    Query:
+      dataset: 
+        name: "VeriWild"
+        image_root: "./dataset/VeRI-Wild/images"
+        cls_label_path: "./dataset/VeRI-Wild/train_test_split/test_3000_id_query.txt"
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 6
+        use_shared_memory: True
+
+    Gallery:
+      dataset: 
+        name: "VeriWild"
+        image_root: "./dataset/VeRI-Wild/images"
+        cls_label_path: "./dataset/VeRI-Wild/train_test_split/test_3000_id.txt"
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 6
+        use_shared_memory: True
+
+Metric:
+  Eval:
+    - Recallk:
+        topk: [1, 5]
+    - mAP: {}
+
--- a/ppcls/data/init.py
+++ b/ppcls/data/init.py
@ -26,9 +26,12 @@ from ppcls.data.dataloader.common_dataset import create_operators
 from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
 from ppcls.data.dataloader.logo_dataset import LogoDataset
 from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
+from ppcls.data.dataloader.mix_dataset import MixDataset

 # sampler
 from ppcls.data.dataloader.DistributedRandomIdentitySampler import DistributedRandomIdentitySampler
+from ppcls.data.dataloader.pk_sampler import PKSampler
+from ppcls.data.dataloader.mix_sampler import MixSampler
 from ppcls.data import preprocess
 from ppcls.data.preprocess import transform

--- a/ppcls/data/dataloader/init.py
+++ b/ppcls/data/dataloader/init.py
@ -0,0 +1,9 @@
+from ppcls.data.dataloader.imagenet_dataset import ImageNetDataset
+from ppcls.data.dataloader.multilabel_dataset import MultiLabelDataset
+from ppcls.data.dataloader.common_dataset import create_operators
+from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
+from ppcls.data.dataloader.logo_dataset import LogoDataset
+from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
+from ppcls.data.dataloader.mix_dataset import MixDataset
+from ppcls.data.dataloader.mix_sampler import MixSampler
+from ppcls.data.dataloader.pk_sampler import PKSampler
--- a/ppcls/data/dataloader/mix_dataset.py
+++ b/ppcls/data/dataloader/mix_dataset.py
@ -0,0 +1,49 @@
+#   Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
+
+import numpy as np
+import os
+
+from paddle.io import Dataset
+from .. import dataloader
+
+
+class MixDataset(Dataset):
+    def __init__(self, datasets_config):
+        super().__init__()
+        self.dataset_list = []
+        start_idx = 0
+        end_idx = 0
+        for config_i in datasets_config:
+            dataset_name = config_i.pop('name')
+            dataset = getattr(dataloader, dataset_name)(**config_i)
+            end_idx += len(dataset)
+            self.dataset_list.append([end_idx, start_idx, dataset])
+            start_idx = end_idx
+
+        self.length = end_idx
+
+    def __getitem__(self, idx):
+        for dataset_i in self.dataset_list:
+            if dataset_i[0] > idx:
+                dataset_i_idx = idx - dataset_i[1]
+                return dataset_i[2][dataset_i_idx]
+
+    def __len__(self):
+        return self.length
+
+    def get_dataset_list(self):
+        return self.dataset_list
--- a/ppcls/data/dataloader/mix_sampler.py
+++ b/ppcls/data/dataloader/mix_sampler.py
@ -0,0 +1,79 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+
+from paddle.io import DistributedBatchSampler, Sampler
+
+from ppcls.utils import logger
+from ppcls.data.dataloader.mix_dataset import MixDataset
+from ppcls.data import dataloader
+
+
+class MixSampler(DistributedBatchSampler):
+    def __init__(self, dataset, batch_size, sample_configs, iter_per_epoch):
+        super().__init__(dataset, batch_size)
+        assert isinstance(dataset,
+                          MixDataset), "MixSampler only support MixDataset"
+        self.sampler_list = []
+        self.batch_size = batch_size
+        self.start_list = []
+        self.length = iter_per_epoch
+        dataset_list = dataset.get_dataset_list()
+        batch_size_left = self.batch_size
+        self.iter_list = []
+        for i, config_i in enumerate(sample_configs):
+            self.start_list.append(dataset_list[i][1])
+            sample_method = config_i.pop("name")
+            ratio_i = config_i.pop("ratio")
+            if i < len(sample_configs) - 1:
+                batch_size_i = int(self.batch_size * ratio_i)
+                batch_size_left -= batch_size_i
+            else:
+                batch_size_i = batch_size_left
+            assert batch_size_i <= len(dataset_list[i][2])
+            config_i["batch_size"] = batch_size_i
+            if sample_method == "DistributedBatchSampler":
+                sampler_i = DistributedBatchSampler(dataset_list[i][2],
+                                                    **config_i)
+            else:
+                sampler_i = getattr(dataloader, sample_method)(
+                    dataset_list[i][2], **config_i)
+            self.sampler_list.append(sampler_i)
+            self.iter_list.append(iter(sampler_i))
+            self.length += len(dataset_list[i][2]) * ratio_i
+            self.iter_counter = 0
+
+    def __iter__(self):
+        while self.iter_counter < self.length:
+            batch = []
+            for i, iter_i in enumerate(self.iter_list):
+                batch_i = next(iter_i, None)
+                if batch_i is None:
+                    iter_i = iter(self.sampler_list[i])
+                    self.iter_list[i] = iter_i
+                    batch_i = next(iter_i, None)
+                    assert batch_i is not None, "dataset {} return None".format(
+                        i)
+                batch += [idx + self.start_list[i] for idx in batch_i]
+            if len(batch) == self.batch_size:
+                self.iter_counter += 1
+                yield batch
+            else:
+                logger.info("Some dataset reaches end")
+        self.iter_counter = 0
+
+    def __len__(self):
+        return self.length
--- a/ppcls/data/dataloader/multilabel_dataset.py
+++ b/ppcls/data/dataloader/multilabel_dataset.py
@ -33,7 +33,7 @@ class MultiLabelDataset(CommonDataset):
        with open(self._cls_path) as fd:
            lines = fd.readlines()
            for l in lines:
-                l = l.strip().split(" ")
+                l = l.strip().split("\t")
                self.images.append(os.path.join(self._img_root, l[0]))

                labels = l[1].split(',')
@ -44,13 +44,14 @@ class MultiLabelDataset(CommonDataset):

    def __getitem__(self, idx):
        try:
-            img = cv2.imread(self.images[idx])
-            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+            with open(self.images[idx], 'rb') as f:
+                img = f.read()
            if self._transform_ops:
                img = transform(img, self._transform_ops)
            img = img.transpose((2, 0, 1))
            label = np.array(self.labels[idx]).astype("float32")
            return (img, label)
+
        except Exception as ex:
            logger.error("Exception occured when parse line: {} with msg: {}".
                         format(self.images[idx], ex))
--- a/ppcls/data/dataloader/pk_sampler.py
+++ b/ppcls/data/dataloader/pk_sampler.py
@ -0,0 +1,105 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from collections import defaultdict
+import numpy as np
+import random
+from paddle.io import DistributedBatchSampler
+
+from ppcls.utils import logger
+
+
+class PKSampler(DistributedBatchSampler):
+    """
+    First, randomly sample P identities.
+    Then for each identity randomly sample K instances.
+    Therefore batch size is P*K, and the sampler called PKSampler.
+    Args:
+        dataset (paddle.io.Dataset): list of (img_path, pid, cam_id).
+        sample_per_id(int): number of instances per identity in a batch.
+        batch_size (int): number of examples in a batch.
+        shuffle(bool): whether to shuffle indices order before generating
+            batch indices. Default False.
+    """
+
+    def __init__(self,
+                 dataset,
+                 batch_size,
+                 sample_per_id,
+                 shuffle=True,
+                 drop_last=True,
+                 sample_method="sample_avg_prob"):
+        super().__init__(
+            dataset, batch_size, shuffle=shuffle, drop_last=drop_last)
+        assert batch_size % sample_per_id == 0, \
+            "PKSampler configs error, Sample_per_id must be a divisor of batch_size."
+        assert hasattr(self.dataset,
+                       "labels"), "Dataset must have labels attribute."
+        self.sample_per_label = sample_per_id
+        self.label_dict = defaultdict(list)
+        self.sample_method = sample_method
+        for idx, label in enumerate(self.dataset.labels):
+            self.label_dict[label].append(idx)
+        self.label_list = list(self.label_dict)
+        assert len(self.label_list) * self.sample_per_label > self.batch_size, \
+            "batch size should be smaller than "
+        if self.sample_method == "id_avg_prob":
+            self.prob_list = np.array([1 / len(self.label_list)] *
+                                      len(self.label_list))
+        elif self.sample_method == "sample_avg_prob":
+            counter = []
+            for label_i in self.label_list:
+                counter.append(len(self.label_dict[label_i]))
+            self.prob_list = np.array(counter) / sum(counter)
+        else:
+            logger.error(
+                "PKSampler only support id_avg_prob and sample_avg_prob sample method, "
+                "but receive {}.".format(self.sample_method))
+        diff = np.abs(sum(self.prob_list) - 1)
+        if diff > 0.00000001:
+            self.prob_list[-1] = 1 - sum(self.prob_list[:-1])
+            if self.prob_list[-1] > 1 or self.prob_list[-1] < 0:
+                logger.error("PKSampler prob list error")
+            else:
+                logger.info(
+                    "PKSampler: sum of prob list not equal to 1, diff is {}, change the last prob".format(diff)
+                )
+
+    def __iter__(self):
+        label_per_batch = self.batch_size // self.sample_per_label
+        for _ in range(len(self)):
+            batch_index = []
+            batch_label_list = np.random.choice(
+                self.label_list,
+                size=label_per_batch,
+                replace=False,
+                p=self.prob_list)
+            for label_i in batch_label_list:
+                label_i_indexes = self.label_dict[label_i]
+                if self.sample_per_label <= len(label_i_indexes):
+                    batch_index.extend(
+                        np.random.choice(
+                            label_i_indexes,
+                            size=self.sample_per_label,
+                            replace=False))
+                else:
+                    batch_index.extend(
+                        np.random.choice(
+                            label_i_indexes,
+                            size=self.sample_per_label,
+                            replace=True))
+            if not self.drop_last or len(batch_index) == self.batch_size:
+                yield batch_index
--- a/ppcls/data/postprocess/init.py
+++ b/ppcls/data/postprocess/init.py
@ -16,7 +16,7 @@ import importlib

 from . import topk

-from .topk import Topk
+from .topk import Topk, MultiLabelTopk


 def build_postprocess(config):
--- a/ppcls/data/postprocess/topk.py
+++ b/ppcls/data/postprocess/topk.py
@ -45,15 +45,17 @@ class Topk(object):
            class_id_map = None
        return class_id_map

-    def __call__(self, x, file_names=None):
+    def __call__(self, x, file_names=None, multilabel=False):
        assert isinstance(x, paddle.Tensor)
        if file_names is not None:
            assert x.shape[0] == len(file_names)
-        x = F.softmax(x, axis=-1)
+        x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
        x = x.numpy()
        y = []
        for idx, probs in enumerate(x):
-            index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32")
+            index = probs.argsort(axis=0)[-self.topk:][::-1].astype(
+                "int32") if not multilabel else np.where(
+                    probs >= 0.5)[0].astype("int32")
            clas_id_list = []
            score_list = []
            label_name_list = []
@ -73,3 +75,11 @@ class Topk(object):
                result["label_names"] = label_name_list
            y.append(result)
        return y
+
+
+class MultiLabelTopk(Topk):
+    def __init__(self, topk=1, class_id_map_file=None):
+        super().__init__()
+
+    def __call__(self, x, file_names=None):
+        return super().__call__(x, file_names, multilabel=True)
--- a/ppcls/data/preprocess/init.py
+++ b/ppcls/data/preprocess/init.py
@ -14,6 +14,7 @@

 from ppcls.data.preprocess.ops.autoaugment import ImageNetPolicy as RawImageNetPolicy
 from ppcls.data.preprocess.ops.randaugment import RandAugment as RawRandAugment
+from ppcls.data.preprocess.ops.timm_autoaugment import RawTimmAutoAugment
 from ppcls.data.preprocess.ops.cutout import Cutout

 from ppcls.data.preprocess.ops.hide_and_seek import HideAndSeek
@ -29,9 +30,8 @@ from ppcls.data.preprocess.ops.operators import NormalizeImage
 from ppcls.data.preprocess.ops.operators import ToCHWImage
 from ppcls.data.preprocess.ops.operators import AugMix

-from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, FmixOperator
+from ppcls.data.preprocess.batch_ops.batch_operators import MixupOperator, CutmixOperator, OpSampler, FmixOperator

-import six
 import numpy as np
 from PIL import Image

@ -45,21 +45,16 @@ def transform(data, ops=[]):

 class AutoAugment(RawImageNetPolicy):
    """ ImageNetPolicy wrapper to auto fit different img types """
+
    def __init__(self, *args, **kwargs):
-        if six.PY2:
-            super(AutoAugment, self).__init__(*args, **kwargs)
-        else:
-            super().__init__(*args, **kwargs)
+        super().__init__(*args, **kwargs)

    def __call__(self, img):
        if not isinstance(img, Image.Image):
            img = np.ascontiguousarray(img)
            img = Image.fromarray(img)

-        if six.PY2:
-            img = super(AutoAugment, self).__call__(img)
-        else:
-            img = super().__call__(img)
+        img = super().__call__(img)

        if isinstance(img, Image.Image):
            img = np.asarray(img)
@ -69,21 +64,35 @@ class AutoAugment(RawImageNetPolicy):

 class RandAugment(RawRandAugment):
    """ RandAugment wrapper to auto fit different img types """
+
    def __init__(self, *args, **kwargs):
-        if six.PY2:
-            super(RandAugment, self).__init__(*args, **kwargs)
-        else:
-            super().__init__(*args, **kwargs)
+        super().__init__(*args, **kwargs)

    def __call__(self, img):
        if not isinstance(img, Image.Image):
            img = np.ascontiguousarray(img)
            img = Image.fromarray(img)

-        if six.PY2:
-            img = super(RandAugment, self).__call__(img)
-        else:
-            img = super().__call__(img)
+        img = super().__call__(img)
+
+        if isinstance(img, Image.Image):
+            img = np.asarray(img)
+
+        return img
+
+
+class TimmAutoAugment(RawTimmAutoAugment):
+    """ TimmAutoAugment wrapper to auto fit different img tyeps. """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def __call__(self, img):
+        if not isinstance(img, Image.Image):
+            img = np.ascontiguousarray(img)
+            img = Image.fromarray(img)
+
+        img = super().__call__(img)

        if isinstance(img, Image.Image):
            img = np.asarray(img)
--- a/ppcls/data/preprocess/batch_ops/batch_operators.py
+++ b/ppcls/data/preprocess/batch_ops/batch_operators.py
@ -16,13 +16,17 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals
+import random
+
 import numpy as np

+from ppcls.utils import logger
 from ppcls.data.preprocess.ops.fmix import sample_mask


 class BatchOperator(object):
    """ BatchOperator """
+
    def __init__(self, *args, **kwargs):
        pass

@ -46,9 +50,20 @@ class BatchOperator(object):

 class MixupOperator(BatchOperator):
    """ Mixup operator """
-    def __init__(self, alpha=0.2):
-        assert alpha > 0., \
-                'parameter alpha[%f] should > 0.0' % (alpha)
+
+    def __init__(self, alpha: float=1.):
+        """Build Mixup operator
+
+        Args:
+            alpha (float, optional): The parameter alpha of mixup. Defaults to 1..
+
+        Raises:
+            Exception: The value of parameter is illegal.
+        """
+        if alpha <= 0:
+            raise Exception(
+                f"Parameter \"alpha\" of Mixup should be greater than 0. \"alpha\": {alpha}."
+            )
        self._alpha = alpha

    def __call__(self, batch):
@ -62,9 +77,20 @@ class MixupOperator(BatchOperator):

 class CutmixOperator(BatchOperator):
    """ Cutmix operator """
+
    def __init__(self, alpha=0.2):
-        assert alpha > 0., \
-                'parameter alpha[%f] should > 0.0' % (alpha)
+        """Build Cutmix operator
+
+        Args:
+            alpha (float, optional): The parameter alpha of cutmix. Defaults to 0.2.
+
+        Raises:
+            Exception: The value of parameter is illegal.
+        """
+        if alpha <= 0:
+            raise Exception(
+                f"Parameter \"alpha\" of Cutmix should be greater than 0. \"alpha\": {alpha}."
+            )
        self._alpha = alpha

    def _rand_bbox(self, size, lam):
@ -72,8 +98,8 @@ class CutmixOperator(BatchOperator):
        w = size[2]
        h = size[3]
        cut_rat = np.sqrt(1. - lam)
-        cut_w = np.int(w * cut_rat)
-        cut_h = np.int(h * cut_rat)
+        cut_w = int(w * cut_rat)
+        cut_h = int(h * cut_rat)

        # uniform
        cx = np.random.randint(w)
@ -101,6 +127,7 @@ class CutmixOperator(BatchOperator):

 class FmixOperator(BatchOperator):
    """ Fmix operator """
+
    def __init__(self, alpha=1, decay_power=3, max_soft=0., reformulate=False):
        self._alpha = alpha
        self._decay_power = decay_power
@ -115,3 +142,42 @@ class FmixOperator(BatchOperator):
                size, self._max_soft, self._reformulate)
        imgs = mask * imgs + (1 - mask) * imgs[idx]
        return list(zip(imgs, labels, labels[idx], [lam] * bs))
+
+
+class OpSampler(object):
+    """ Sample a operator from  """
+
+    def __init__(self, **op_dict):
+        """Build OpSampler
+
+        Raises:
+            Exception: The parameter \"prob\" of operator(s) are be set error.
+        """
+        if len(op_dict) < 1:
+            msg = f"ConfigWarning: No operator in \"OpSampler\". \"OpSampler\" has been skipped."
+
+        self.ops = {}
+        total_prob = 0
+        for op_name in op_dict:
+            param = op_dict[op_name]
+            if "prob" not in param:
+                msg = f"ConfigWarning: Parameter \"prob\" should be set when use operator in \"OpSampler\". The operator \"{op_name}\"'s prob has been set \"0\"."
+                logger.warning(msg)
+            prob = param.pop("prob", 0)
+            total_prob += prob
+            op = eval(op_name)(**param)
+            self.ops.update({op: prob})
+
+        if total_prob > 1:
+            msg = f"ConfigError: The total prob of operators in \"OpSampler\" should be less 1."
+            logger.error(msg)
+            raise Exception(msg)
+
+        # add "None Op" when total_prob < 1, "None Op" do nothing
+        self.ops[None] = 1 - total_prob
+
+    def __call__(self, batch):
+        op = random.choices(
+            list(self.ops.keys()), weights=list(self.ops.values()), k=1)[0]
+        # return batch directly when None Op
+        return op(batch) if op else batch
--- a/ppcls/data/preprocess/ops/operators.py
+++ b/ppcls/data/preprocess/ops/operators.py
@ -19,15 +19,62 @@ from __future__ import division
 from __future__ import print_function
 from __future__ import unicode_literals

+from functools import partial
 import six
 import math
 import random
 import cv2
 import numpy as np
 from PIL import Image
+from paddle.vision.transforms import ColorJitter as RawColorJitter

 from .autoaugment import ImageNetPolicy
 from .functional import augmentations
+from ppcls.utils import logger
+
+
+class UnifiedResize(object):
+    def __init__(self, interpolation=None, backend="cv2"):
+        _cv2_interp_from_str = {
+            'nearest': cv2.INTER_NEAREST,
+            'bilinear': cv2.INTER_LINEAR,
+            'area': cv2.INTER_AREA,
+            'bicubic': cv2.INTER_CUBIC,
+            'lanczos': cv2.INTER_LANCZOS4
+        }
+        _pil_interp_from_str = {
+            'nearest': Image.NEAREST,
+            'bilinear': Image.BILINEAR,
+            'bicubic': Image.BICUBIC,
+            'box': Image.BOX,
+            'lanczos': Image.LANCZOS,
+            'hamming': Image.HAMMING
+        }
+
+        def _pil_resize(src, size, resample):
+            pil_img = Image.fromarray(src)
+            pil_img = pil_img.resize(size, resample)
+            return np.asarray(pil_img)
+
+        if backend.lower() == "cv2":
+            if isinstance(interpolation, str):
+                interpolation = _cv2_interp_from_str[interpolation.lower()]
+            # compatible with opencv < version 4.4.0
+            elif interpolation is None:
+                interpolation = cv2.INTER_LINEAR
+            self.resize_func = partial(cv2.resize, interpolation=interpolation)
+        elif backend.lower() == "pil":
+            if isinstance(interpolation, str):
+                interpolation = _pil_interp_from_str[interpolation.lower()]
+            self.resize_func = partial(_pil_resize, resample=interpolation)
+        else:
+            logger.warning(
+                f"The backend of Resize only support \"cv2\" or \"PIL\". \"f{backend}\" is unavailable. Use \"cv2\" instead."
+            )
+            self.resize_func = cv2.resize
+
+    def __call__(self, src, size):
+        return self.resize_func(src, size)


 class OperatorParamError(ValueError):
@ -67,8 +114,11 @@ class DecodeImage(object):
 class ResizeImage(object):
    """ resize image """

-    def __init__(self, size=None, resize_short=None, interpolation=-1):
-        self.interpolation = interpolation if interpolation >= 0 else None
+    def __init__(self,
+                 size=None,
+                 resize_short=None,
+                 interpolation=None,
+                 backend="cv2"):
        if resize_short is not None and resize_short > 0:
            self.resize_short = resize_short
            self.w = None
@ -81,6 +131,9 @@ class ResizeImage(object):
            raise OperatorParamError("invalid params for ReisizeImage for '\
                'both 'size' and 'resize_short' are None")

+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
+
    def __call__(self, img):
        img_h, img_w = img.shape[:2]
        if self.resize_short is not None:
@ -90,10 +143,7 @@ class ResizeImage(object):
        else:
            w = self.w
            h = self.h
-        if self.interpolation is None:
-            return cv2.resize(img, (w, h))
-        else:
-            return cv2.resize(img, (w, h), interpolation=self.interpolation)
+        return self._resize_func(img, (w, h))


 class CropImage(object):
@ -119,9 +169,12 @@ class CropImage(object):
 class RandCropImage(object):
    """ random crop image """

-    def __init__(self, size, scale=None, ratio=None, interpolation=-1):
-
-        self.interpolation = interpolation if interpolation >= 0 else None
+    def __init__(self,
+                 size,
+                 scale=None,
+                 ratio=None,
+                 interpolation=None,
+                 backend="cv2"):
        if type(size) is int:
            self.size = (size, size)  # (h, w)
        else:
@ -130,6 +183,9 @@ class RandCropImage(object):
        self.scale = [0.08, 1.0] if scale is None else scale
        self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio

+        self._resize_func = UnifiedResize(
+            interpolation=interpolation, backend=backend)
+
    def __call__(self, img):
        size = self.size
        scale = self.scale
@ -155,10 +211,8 @@ class RandCropImage(object):
        j = random.randint(0, img_h - h)

        img = img[j:j + h, i:i + w, :]
-        if self.interpolation is None:
-            return cv2.resize(img, size)
-        else:
-            return cv2.resize(img, size, interpolation=self.interpolation)
+
+        return self._resize_func(img, size)


 class RandFlipImage(object):
@ -313,3 +367,20 @@ class AugMix(object):

        mixed = (1 - m) * image + m * mix
        return mixed.astype(np.uint8)
+
+
+class ColorJitter(RawColorJitter):
+    """ColorJitter.
+    """
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def __call__(self, img):
+        if not isinstance(img, Image.Image):
+            img = np.ascontiguousarray(img)
+            img = Image.fromarray(img)
+        img = super()._apply_image(img)
+        if isinstance(img, Image.Image):
+            img = np.asarray(img)
+        return img
--- a/ppcls/data/preprocess/ops/random_erasing.py
+++ b/ppcls/data/preprocess/ops/random_erasing.py
@ -12,7 +12,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-#This code is based on https://github.com/zhunzhong07/Random-Erasing
+#This code is adapted from https://github.com/zhunzhong07/Random-Erasing, and refer to Timm.
+
+from functools import partial

 import math
 import random
@ -20,36 +22,69 @@ import random
 import numpy as np


+class Pixels(object):
+    def __init__(self, mode="const", mean=[0., 0., 0.]):
+        self._mode = mode
+        self._mean = mean
+
+    def __call__(self, h=224, w=224, c=3):
+        if self._mode == "rand":
+            return np.random.normal(size=(1, 1, 3))
+        elif self._mode == "pixel":
+            return np.random.normal(size=(h, w, c))
+        elif self._mode == "const":
+            return self._mean
+        else:
+            raise Exception(
+                "Invalid mode in RandomErasing, only support \"const\", \"rand\", \"pixel\""
+            )
+
+
 class RandomErasing(object):
-    def __init__(self, EPSILON=0.5, sl=0.02, sh=0.4, r1=0.3,
-                 mean=[0., 0., 0.]):
-        self.EPSILON = EPSILON
-        self.mean = mean
-        self.sl = sl
-        self.sh = sh
-        self.r1 = r1
+    """RandomErasing.
+    """
+
+    def __init__(self,
+                 EPSILON=0.5,
+                 sl=0.02,
+                 sh=0.4,
+                 r1=0.3,
+                 mean=[0., 0., 0.],
+                 attempt=100,
+                 use_log_aspect=False,
+                 mode='const'):
+        self.EPSILON = eval(EPSILON) if isinstance(EPSILON, str) else EPSILON
+        self.sl = eval(sl) if isinstance(sl, str) else sl
+        self.sh = eval(sh) if isinstance(sh, str) else sh
+        r1 = eval(r1) if isinstance(r1, str) else r1
+        self.r1 = (math.log(r1), math.log(1 / r1)) if use_log_aspect else (
+            r1, 1 / r1)
+        self.use_log_aspect = use_log_aspect
+        self.attempt = attempt
+        self.get_pixels = Pixels(mode, mean)

    def __call__(self, img):
-        if random.uniform(0, 1) > self.EPSILON:
+        if random.random() > self.EPSILON:
            return img

-        for _ in range(100):
+        for _ in range(self.attempt):
            area = img.shape[0] * img.shape[1]

            target_area = random.uniform(self.sl, self.sh) * area
-            aspect_ratio = random.uniform(self.r1, 1 / self.r1)
+            aspect_ratio = random.uniform(*self.r1)
+            if self.use_log_aspect:
+                aspect_ratio = math.exp(aspect_ratio)

            h = int(round(math.sqrt(target_area * aspect_ratio)))
            w = int(round(math.sqrt(target_area / aspect_ratio)))

            if w < img.shape[1] and h < img.shape[0]:
+                pixels = self.get_pixels(h, w, img.shape[2])
                x1 = random.randint(0, img.shape[0] - h)
                y1 = random.randint(0, img.shape[1] - w)
-                if img.shape[0] == 3:
-                    img[x1:x1 + h, y1:y1 + w, 0] = self.mean[0]
-                    img[x1:x1 + h, y1:y1 + w, 1] = self.mean[1]
-                    img[x1:x1 + h, y1:y1 + w, 2] = self.mean[2]
+                if img.shape[2] == 3:
+                    img[x1:x1 + h, y1:y1 + w, :] = pixels
                else:
-                    img[0, x1:x1 + h, y1:y1 + w] = self.mean[1]
+                    img[x1:x1 + h, y1:y1 + w, 0] = pixels[0]
                return img
        return img
--- a/ppcls/data/preprocess/ops/timm_autoaugment.py
+++ b/ppcls/data/preprocess/ops/timm_autoaugment.py
@ -0,0 +1,879 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This code implements is borrowed from Timm: https://github.com/rwightman/pytorch-image-models.
+hacked together by / Copyright 2020 Ross Wightman
+"""
+
+import random
+import math
+import re
+from PIL import Image, ImageOps, ImageEnhance, ImageChops
+import PIL
+import numpy as np
+
+IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
+
+_PIL_VER = tuple([int(x) for x in PIL.__version__.split('.')[:2]])
+
+_FILL = (128, 128, 128)
+
+# This signifies the max integer that the controller RNN could predict for the
+# augmentation scheme.
+_MAX_LEVEL = 10.
+
+_HPARAMS_DEFAULT = dict(
+    translate_const=250,
+    img_mean=_FILL, )
+
+_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
+
+
+def _pil_interp(method):
+    if method == 'bicubic':
+        return Image.BICUBIC
+    elif method == 'lanczos':
+        return Image.LANCZOS
+    elif method == 'hamming':
+        return Image.HAMMING
+    else:
+        # default bilinear, do we want to allow nearest?
+        return Image.BILINEAR
+
+
+def _interpolation(kwargs):
+    interpolation = kwargs.pop('resample', Image.BILINEAR)
+    if isinstance(interpolation, (list, tuple)):
+        return random.choice(interpolation)
+    else:
+        return interpolation
+
+
+def _check_args_tf(kwargs):
+    if 'fillcolor' in kwargs and _PIL_VER < (5, 0):
+        kwargs.pop('fillcolor')
+    kwargs['resample'] = _interpolation(kwargs)
+
+
+def shear_x(img, factor, **kwargs):
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, factor, 0, 0, 1, 0),
+                         **kwargs)
+
+
+def shear_y(img, factor, **kwargs):
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, factor, 1, 0),
+                         **kwargs)
+
+
+def translate_x_rel(img, pct, **kwargs):
+    pixels = pct * img.size[0]
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
+                         **kwargs)
+
+
+def translate_y_rel(img, pct, **kwargs):
+    pixels = pct * img.size[1]
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
+                         **kwargs)
+
+
+def translate_x_abs(img, pixels, **kwargs):
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0),
+                         **kwargs)
+
+
+def translate_y_abs(img, pixels, **kwargs):
+    _check_args_tf(kwargs)
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels),
+                         **kwargs)
+
+
+def rotate(img, degrees, **kwargs):
+    _check_args_tf(kwargs)
+    if _PIL_VER >= (5, 2):
+        return img.rotate(degrees, **kwargs)
+    elif _PIL_VER >= (5, 0):
+        w, h = img.size
+        post_trans = (0, 0)
+        rotn_center = (w / 2.0, h / 2.0)
+        angle = -math.radians(degrees)
+        matrix = [
+            round(math.cos(angle), 15),
+            round(math.sin(angle), 15),
+            0.0,
+            round(-math.sin(angle), 15),
+            round(math.cos(angle), 15),
+            0.0,
+        ]
+
+        def transform(x, y, matrix):
+            (a, b, c, d, e, f) = matrix
+            return a * x + b * y + c, d * x + e * y + f
+
+        matrix[2], matrix[5] = transform(-rotn_center[0] - post_trans[0],
+                                         -rotn_center[1] - post_trans[1],
+                                         matrix)
+        matrix[2] += rotn_center[0]
+        matrix[5] += rotn_center[1]
+        return img.transform(img.size, Image.AFFINE, matrix, **kwargs)
+    else:
+        return img.rotate(degrees, resample=kwargs['resample'])
+
+
+def auto_contrast(img, **__):
+    return ImageOps.autocontrast(img)
+
+
+def invert(img, **__):
+    return ImageOps.invert(img)
+
+
+def equalize(img, **__):
+    return ImageOps.equalize(img)
+
+
+def solarize(img, thresh, **__):
+    return ImageOps.solarize(img, thresh)
+
+
+def solarize_add(img, add, thresh=128, **__):
+    lut = []
+    for i in range(256):
+        if i < thresh:
+            lut.append(min(255, i + add))
+        else:
+            lut.append(i)
+    if img.mode in ("L", "RGB"):
+        if img.mode == "RGB" and len(lut) == 256:
+            lut = lut + lut + lut
+        return img.point(lut)
+    else:
+        return img
+
+
+def posterize(img, bits_to_keep, **__):
+    if bits_to_keep >= 8:
+        return img
+    return ImageOps.posterize(img, bits_to_keep)
+
+
+def contrast(img, factor, **__):
+    return ImageEnhance.Contrast(img).enhance(factor)
+
+
+def color(img, factor, **__):
+    return ImageEnhance.Color(img).enhance(factor)
+
+
+def brightness(img, factor, **__):
+    return ImageEnhance.Brightness(img).enhance(factor)
+
+
+def sharpness(img, factor, **__):
+    return ImageEnhance.Sharpness(img).enhance(factor)
+
+
+def _randomly_negate(v):
+    """With 50% prob, negate the value"""
+    return -v if random.random() > 0.5 else v
+
+
+def _rotate_level_to_arg(level, _hparams):
+    # range [-30, 30]
+    level = (level / _MAX_LEVEL) * 30.
+    level = _randomly_negate(level)
+    return level,
+
+
+def _enhance_level_to_arg(level, _hparams):
+    # range [0.1, 1.9]
+    return (level / _MAX_LEVEL) * 1.8 + 0.1,
+
+
+def _enhance_increasing_level_to_arg(level, _hparams):
+    # the 'no change' level is 1.0, moving away from that towards 0. or 2.0 increases the enhancement blend
+    # range [0.1, 1.9]
+    level = (level / _MAX_LEVEL) * .9
+    level = 1.0 + _randomly_negate(level)
+    return level,
+
+
+def _shear_level_to_arg(level, _hparams):
+    # range [-0.3, 0.3]
+    level = (level / _MAX_LEVEL) * 0.3
+    level = _randomly_negate(level)
+    return level,
+
+
+def _translate_abs_level_to_arg(level, hparams):
+    translate_const = hparams['translate_const']
+    level = (level / _MAX_LEVEL) * float(translate_const)
+    level = _randomly_negate(level)
+    return level,
+
+
+def _translate_rel_level_to_arg(level, hparams):
+    # default range [-0.45, 0.45]
+    translate_pct = hparams.get('translate_pct', 0.45)
+    level = (level / _MAX_LEVEL) * translate_pct
+    level = _randomly_negate(level)
+    return level,
+
+
+def _posterize_level_to_arg(level, _hparams):
+    # As per Tensorflow TPU EfficientNet impl
+    # range [0, 4], 'keep 0 up to 4 MSB of original image'
+    # intensity/severity of augmentation decreases with level
+    return int((level / _MAX_LEVEL) * 4),
+
+
+def _posterize_increasing_level_to_arg(level, hparams):
+    # As per Tensorflow models research and UDA impl
+    # range [4, 0], 'keep 4 down to 0 MSB of original image',
+    # intensity/severity of augmentation increases with level
+    return 4 - _posterize_level_to_arg(level, hparams)[0],
+
+
+def _posterize_original_level_to_arg(level, _hparams):
+    # As per original AutoAugment paper description
+    # range [4, 8], 'keep 4 up to 8 MSB of image'
+    # intensity/severity of augmentation decreases with level
+    return int((level / _MAX_LEVEL) * 4) + 4,
+
+
+def _solarize_level_to_arg(level, _hparams):
+    # range [0, 256]
+    # intensity/severity of augmentation decreases with level
+    return int((level / _MAX_LEVEL) * 256),
+
+
+def _solarize_increasing_level_to_arg(level, _hparams):
+    # range [0, 256]
+    # intensity/severity of augmentation increases with level
+    return 256 - _solarize_level_to_arg(level, _hparams)[0],
+
+
+def _solarize_add_level_to_arg(level, _hparams):
+    # range [0, 110]
+    return int((level / _MAX_LEVEL) * 110),
+
+
+LEVEL_TO_ARG = {
+    'AutoContrast': None,
+    'Equalize': None,
+    'Invert': None,
+    'Rotate': _rotate_level_to_arg,
+    # There are several variations of the posterize level scaling in various Tensorflow/Google repositories/papers
+    'Posterize': _posterize_level_to_arg,
+    'PosterizeIncreasing': _posterize_increasing_level_to_arg,
+    'PosterizeOriginal': _posterize_original_level_to_arg,
+    'Solarize': _solarize_level_to_arg,
+    'SolarizeIncreasing': _solarize_increasing_level_to_arg,
+    'SolarizeAdd': _solarize_add_level_to_arg,
+    'Color': _enhance_level_to_arg,
+    'ColorIncreasing': _enhance_increasing_level_to_arg,
+    'Contrast': _enhance_level_to_arg,
+    'ContrastIncreasing': _enhance_increasing_level_to_arg,
+    'Brightness': _enhance_level_to_arg,
+    'BrightnessIncreasing': _enhance_increasing_level_to_arg,
+    'Sharpness': _enhance_level_to_arg,
+    'SharpnessIncreasing': _enhance_increasing_level_to_arg,
+    'ShearX': _shear_level_to_arg,
+    'ShearY': _shear_level_to_arg,
+    'TranslateX': _translate_abs_level_to_arg,
+    'TranslateY': _translate_abs_level_to_arg,
+    'TranslateXRel': _translate_rel_level_to_arg,
+    'TranslateYRel': _translate_rel_level_to_arg,
+}
+
+NAME_TO_OP = {
+    'AutoContrast': auto_contrast,
+    'Equalize': equalize,
+    'Invert': invert,
+    'Rotate': rotate,
+    'Posterize': posterize,
+    'PosterizeIncreasing': posterize,
+    'PosterizeOriginal': posterize,
+    'Solarize': solarize,
+    'SolarizeIncreasing': solarize,
+    'SolarizeAdd': solarize_add,
+    'Color': color,
+    'ColorIncreasing': color,
+    'Contrast': contrast,
+    'ContrastIncreasing': contrast,
+    'Brightness': brightness,
+    'BrightnessIncreasing': brightness,
+    'Sharpness': sharpness,
+    'SharpnessIncreasing': sharpness,
+    'ShearX': shear_x,
+    'ShearY': shear_y,
+    'TranslateX': translate_x_abs,
+    'TranslateY': translate_y_abs,
+    'TranslateXRel': translate_x_rel,
+    'TranslateYRel': translate_y_rel,
+}
+
+
+class AugmentOp(object):
+    def __init__(self, name, prob=0.5, magnitude=10, hparams=None):
+        hparams = hparams or _HPARAMS_DEFAULT
+        self.aug_fn = NAME_TO_OP[name]
+        self.level_fn = LEVEL_TO_ARG[name]
+        self.prob = prob
+        self.magnitude = magnitude
+        self.hparams = hparams.copy()
+        self.kwargs = dict(
+            fillcolor=hparams['img_mean'] if 'img_mean' in hparams else _FILL,
+            resample=hparams['interpolation']
+            if 'interpolation' in hparams else _RANDOM_INTERPOLATION, )
+
+        # If magnitude_std is > 0, we introduce some randomness
+        # in the usually fixed policy and sample magnitude from a normal distribution
+        # with mean `magnitude` and std-dev of `magnitude_std`.
+        # NOTE This is my own hack, being tested, not in papers or reference impls.
+        self.magnitude_std = self.hparams.get('magnitude_std', 0)
+
+    def __call__(self, img):
+        if self.prob < 1.0 and random.random() > self.prob:
+            return img
+        magnitude = self.magnitude
+        if self.magnitude_std and self.magnitude_std > 0:
+            magnitude = random.gauss(magnitude, self.magnitude_std)
+        magnitude = min(_MAX_LEVEL, max(0, magnitude))  # clip to valid range
+        level_args = self.level_fn(
+            magnitude, self.hparams) if self.level_fn is not None else tuple()
+        return self.aug_fn(img, *level_args, **self.kwargs)
+
+
+def auto_augment_policy_v0(hparams):
+    # ImageNet v0 policy from TPU EfficientNet impl, cannot find a paper reference.
+    policy = [
+        [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
+        [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
+        [('Color', 0.4, 1), ('Rotate', 0.6, 8)],
+        [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
+        [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
+        [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
+        [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
+        [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
+        [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
+        [('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
+        [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
+        [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
+        [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)],
+        [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
+        [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
+        [('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
+        [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
+        [('ShearY', 0.8, 0), ('Color', 0.6, 4)],
+        [('Color', 1.0, 0), ('Rotate', 0.6, 2)],
+        [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
+        [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
+        [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
+        [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)
+         ],  # This results in black image with Tpu posterize
+        [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
+        [('Color', 0.8, 6), ('Rotate', 0.4, 5)],
+    ]
+    pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
+    return pc
+
+
+def auto_augment_policy_v0r(hparams):
+    # ImageNet v0 policy from TPU EfficientNet impl, with variation of Posterize used
+    # in Google research implementation (number of bits discarded increases with magnitude)
+    policy = [
+        [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
+        [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
+        [('Color', 0.4, 1), ('Rotate', 0.6, 8)],
+        [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
+        [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
+        [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
+        [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
+        [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
+        [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
+        [('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
+        [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
+        [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
+        [('PosterizeIncreasing', 0.4, 6), ('AutoContrast', 0.4, 7)],
+        [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
+        [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
+        [('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
+        [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
+        [('ShearY', 0.8, 0), ('Color', 0.6, 4)],
+        [('Color', 1.0, 0), ('Rotate', 0.6, 2)],
+        [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
+        [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
+        [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
+        [('PosterizeIncreasing', 0.8, 2), ('Solarize', 0.6, 10)],
+        [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
+        [('Color', 0.8, 6), ('Rotate', 0.4, 5)],
+    ]
+    pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
+    return pc
+
+
+def auto_augment_policy_original(hparams):
+    # ImageNet policy from https://arxiv.org/abs/1805.09501
+    policy = [
+        [('PosterizeOriginal', 0.4, 8), ('Rotate', 0.6, 9)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+        [('PosterizeOriginal', 0.6, 7), ('PosterizeOriginal', 0.6, 6)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
+        [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
+        [('PosterizeOriginal', 0.8, 5), ('Equalize', 1.0, 2)],
+        [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
+        [('Equalize', 0.6, 8), ('PosterizeOriginal', 0.4, 6)],
+        [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
+        [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
+        [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Rotate', 0.8, 8), ('Color', 1.0, 2)],
+        [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
+        [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
+        [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
+        [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+    ]
+    pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
+    return pc
+
+
+def auto_augment_policy_originalr(hparams):
+    # ImageNet policy from https://arxiv.org/abs/1805.09501 with research posterize variation
+    policy = [
+        [('PosterizeIncreasing', 0.4, 8), ('Rotate', 0.6, 9)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+        [('PosterizeIncreasing', 0.6, 7), ('PosterizeIncreasing', 0.6, 6)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
+        [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
+        [('PosterizeIncreasing', 0.8, 5), ('Equalize', 1.0, 2)],
+        [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
+        [('Equalize', 0.6, 8), ('PosterizeIncreasing', 0.4, 6)],
+        [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
+        [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
+        [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Rotate', 0.8, 8), ('Color', 1.0, 2)],
+        [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
+        [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
+        [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
+        [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+    ]
+    pc = [[AugmentOp(*a, hparams=hparams) for a in sp] for sp in policy]
+    return pc
+
+
+def auto_augment_policy(name='v0', hparams=None):
+    hparams = hparams or _HPARAMS_DEFAULT
+    if name == 'original':
+        return auto_augment_policy_original(hparams)
+    elif name == 'originalr':
+        return auto_augment_policy_originalr(hparams)
+    elif name == 'v0':
+        return auto_augment_policy_v0(hparams)
+    elif name == 'v0r':
+        return auto_augment_policy_v0r(hparams)
+    else:
+        assert False, 'Unknown AA policy (%s)' % name
+
+
+class AutoAugment(object):
+    def __init__(self, policy):
+        self.policy = policy
+
+    def __call__(self, img):
+        sub_policy = random.choice(self.policy)
+        for op in sub_policy:
+            img = op(img)
+        return img
+
+
+def auto_augment_transform(config_str, hparams):
+    """
+    Create a AutoAugment transform
+
+    :param config_str: String defining configuration of auto augmentation. Consists of multiple sections separated by
+    dashes ('-'). The first section defines the AutoAugment policy (one of 'v0', 'v0r', 'original', 'originalr').
+    The remaining sections, not order sepecific determine
+        'mstd' -  float std deviation of magnitude noise applied
+    Ex 'original-mstd0.5' results in AutoAugment with original policy, magnitude_std 0.5
+
+    :param hparams: Other hparams (kwargs) for the AutoAugmentation scheme
+
+    :return: A callable Transform Op
+    """
+    config = config_str.split('-')
+    policy_name = config[0]
+    config = config[1:]
+    for c in config:
+        cs = re.split(r'(\d.*)', c)
+        if len(cs) < 2:
+            continue
+        key, val = cs[:2]
+        if key == 'mstd':
+            # noise param injected via hparams for now
+            hparams.setdefault('magnitude_std', float(val))
+        else:
+            assert False, 'Unknown AutoAugment config section'
+    aa_policy = auto_augment_policy(policy_name, hparams=hparams)
+    return AutoAugment(aa_policy)
+
+
+_RAND_TRANSFORMS = [
+    'AutoContrast',
+    'Equalize',
+    'Invert',
+    'Rotate',
+    'Posterize',
+    'Solarize',
+    'SolarizeAdd',
+    'Color',
+    'Contrast',
+    'Brightness',
+    'Sharpness',
+    'ShearX',
+    'ShearY',
+    'TranslateXRel',
+    'TranslateYRel',
+    #'Cutout'  # NOTE I've implement this as random erasing separately
+]
+
+_RAND_INCREASING_TRANSFORMS = [
+    'AutoContrast',
+    'Equalize',
+    'Invert',
+    'Rotate',
+    'PosterizeIncreasing',
+    'SolarizeIncreasing',
+    'SolarizeAdd',
+    'ColorIncreasing',
+    'ContrastIncreasing',
+    'BrightnessIncreasing',
+    'SharpnessIncreasing',
+    'ShearX',
+    'ShearY',
+    'TranslateXRel',
+    'TranslateYRel',
+    #'Cutout'  # NOTE I've implement this as random erasing separately
+]
+
+# These experimental weights are based loosely on the relative improvements mentioned in paper.
+# They may not result in increased performance, but could likely be tuned to so.
+_RAND_CHOICE_WEIGHTS_0 = {
+    'Rotate': 0.3,
+    'ShearX': 0.2,
+    'ShearY': 0.2,
+    'TranslateXRel': 0.1,
+    'TranslateYRel': 0.1,
+    'Color': .025,
+    'Sharpness': 0.025,
+    'AutoContrast': 0.025,
+    'Solarize': .005,
+    'SolarizeAdd': .005,
+    'Contrast': .005,
+    'Brightness': .005,
+    'Equalize': .005,
+    'Posterize': 0,
+    'Invert': 0,
+}
+
+
+def _select_rand_weights(weight_idx=0, transforms=None):
+    transforms = transforms or _RAND_TRANSFORMS
+    assert weight_idx == 0  # only one set of weights currently
+    rand_weights = _RAND_CHOICE_WEIGHTS_0
+    probs = [rand_weights[k] for k in transforms]
+    probs /= np.sum(probs)
+    return probs
+
+
+def rand_augment_ops(magnitude=10, hparams=None, transforms=None):
+    hparams = hparams or _HPARAMS_DEFAULT
+    transforms = transforms or _RAND_TRANSFORMS
+    return [
+        AugmentOp(
+            name, prob=0.5, magnitude=magnitude, hparams=hparams)
+        for name in transforms
+    ]
+
+
+class RandAugment(object):
+    def __init__(self, ops, num_layers=2, choice_weights=None):
+        self.ops = ops
+        self.num_layers = num_layers
+        self.choice_weights = choice_weights
+
+    def __call__(self, img):
+        # no replacement when using weighted choice
+        ops = np.random.choice(
+            self.ops,
+            self.num_layers,
+            replace=self.choice_weights is None,
+            p=self.choice_weights)
+        for op in ops:
+            img = op(img)
+        return img
+
+
+def rand_augment_transform(config_str, hparams):
+    """
+    Create a RandAugment transform
+
+    :param config_str: String defining configuration of random augmentation. Consists of multiple sections separated by
+    dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand'). The remaining
+    sections, not order sepecific determine
+        'm' - integer magnitude of rand augment
+        'n' - integer num layers (number of transform ops selected per image)
+        'w' - integer probabiliy weight index (index of a set of weights to influence choice of op)
+        'mstd' -  float std deviation of magnitude noise applied
+        'inc' - integer (bool), use augmentations that increase in severity with magnitude (default: 0)
+    Ex 'rand-m9-n3-mstd0.5' results in RandAugment with magnitude 9, num_layers 3, magnitude_std 0.5
+    'rand-mstd1-w0' results in magnitude_std 1.0, weights 0, default magnitude of 10 and num_layers 2
+
+    :param hparams: Other hparams (kwargs) for the RandAugmentation scheme
+
+    :return: A callable Transform Op
+    """
+    magnitude = _MAX_LEVEL  # default to _MAX_LEVEL for magnitude (currently 10)
+    num_layers = 2  # default to 2 ops per image
+    weight_idx = None  # default to no probability weights for op choice
+    transforms = _RAND_TRANSFORMS
+    config = config_str.split('-')
+    assert config[0] == 'rand'
+    config = config[1:]
+    for c in config:
+        cs = re.split(r'(\d.*)', c)
+        if len(cs) < 2:
+            continue
+        key, val = cs[:2]
+        if key == 'mstd':
+            # noise param injected via hparams for now
+            hparams.setdefault('magnitude_std', float(val))
+        elif key == 'inc':
+            if bool(val):
+                transforms = _RAND_INCREASING_TRANSFORMS
+        elif key == 'm':
+            magnitude = int(val)
+        elif key == 'n':
+            num_layers = int(val)
+        elif key == 'w':
+            weight_idx = int(val)
+        else:
+            assert False, 'Unknown RandAugment config section'
+    ra_ops = rand_augment_ops(
+        magnitude=magnitude, hparams=hparams, transforms=transforms)
+    choice_weights = None if weight_idx is None else _select_rand_weights(
+        weight_idx)
+    return RandAugment(ra_ops, num_layers, choice_weights=choice_weights)
+
+
+_AUGMIX_TRANSFORMS = [
+    'AutoContrast',
+    'ColorIncreasing',  # not in paper
+    'ContrastIncreasing',  # not in paper
+    'BrightnessIncreasing',  # not in paper
+    'SharpnessIncreasing',  # not in paper
+    'Equalize',
+    'Rotate',
+    'PosterizeIncreasing',
+    'SolarizeIncreasing',
+    'ShearX',
+    'ShearY',
+    'TranslateXRel',
+    'TranslateYRel',
+]
+
+
+def augmix_ops(magnitude=10, hparams=None, transforms=None):
+    hparams = hparams or _HPARAMS_DEFAULT
+    transforms = transforms or _AUGMIX_TRANSFORMS
+    return [
+        AugmentOp(
+            name, prob=1.0, magnitude=magnitude, hparams=hparams)
+        for name in transforms
+    ]
+
+
+class AugMixAugment(object):
+    """ AugMix Transform
+    Adapted and improved from impl here: https://github.com/google-research/augmix/blob/master/imagenet.py
+    From paper: 'AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty -
+    https://arxiv.org/abs/1912.02781
+    """
+
+    def __init__(self, ops, alpha=1., width=3, depth=-1, blended=False):
+        self.ops = ops
+        self.alpha = alpha
+        self.width = width
+        self.depth = depth
+        self.blended = blended  # blended mode is faster but not well tested
+
+    def _calc_blended_weights(self, ws, m):
+        ws = ws * m
+        cump = 1.
+        rws = []
+        for w in ws[::-1]:
+            alpha = w / cump
+            cump *= (1 - alpha)
+            rws.append(alpha)
+        return np.array(rws[::-1], dtype=np.float32)
+
+    def _apply_blended(self, img, mixing_weights, m):
+        # This is my first crack and implementing a slightly faster mixed augmentation. Instead
+        # of accumulating the mix for each chain in a Numpy array and then blending with original,
+        # it recomputes the blending coefficients and applies one PIL image blend per chain.
+        # TODO the results appear in the right ballpark but they differ by more than rounding.
+        img_orig = img.copy()
+        ws = self._calc_blended_weights(mixing_weights, m)
+        for w in ws:
+            depth = self.depth if self.depth > 0 else np.random.randint(1, 4)
+            ops = np.random.choice(self.ops, depth, replace=True)
+            img_aug = img_orig  # no ops are in-place, deep copy not necessary
+            for op in ops:
+                img_aug = op(img_aug)
+            img = Image.blend(img, img_aug, w)
+        return img
+
+    def _apply_basic(self, img, mixing_weights, m):
+        # This is a literal adaptation of the paper/official implementation without normalizations and
+        # PIL <-> Numpy conversions between every op. It is still quite CPU compute heavy compared to the
+        # typical augmentation transforms, could use a GPU / Kornia implementation.
+        img_shape = img.size[0], img.size[1], len(img.getbands())
+        mixed = np.zeros(img_shape, dtype=np.float32)
+        for mw in mixing_weights:
+            depth = self.depth if self.depth > 0 else np.random.randint(1, 4)
+            ops = np.random.choice(self.ops, depth, replace=True)
+            img_aug = img  # no ops are in-place, deep copy not necessary
+            for op in ops:
+                img_aug = op(img_aug)
+            mixed += mw * np.asarray(img_aug, dtype=np.float32)
+        np.clip(mixed, 0, 255., out=mixed)
+        mixed = Image.fromarray(mixed.astype(np.uint8))
+        return Image.blend(img, mixed, m)
+
+    def __call__(self, img):
+        mixing_weights = np.float32(
+            np.random.dirichlet([self.alpha] * self.width))
+        m = np.float32(np.random.beta(self.alpha, self.alpha))
+        if self.blended:
+            mixed = self._apply_blended(img, mixing_weights, m)
+        else:
+            mixed = self._apply_basic(img, mixing_weights, m)
+        return mixed
+
+
+def augment_and_mix_transform(config_str, hparams):
+    """ Create AugMix transform
+
+    :param config_str: String defining configuration of random augmentation. Consists of multiple sections separated by
+    dashes ('-'). The first section defines the specific variant of rand augment (currently only 'rand'). The remaining
+    sections, not order sepecific determine
+        'm' - integer magnitude (severity) of augmentation mix (default: 3)
+        'w' - integer width of augmentation chain (default: 3)
+        'd' - integer depth of augmentation chain (-1 is random [1, 3], default: -1)
+        'b' - integer (bool), blend each branch of chain into end result without a final blend, less CPU (default: 0)
+        'mstd' -  float std deviation of magnitude noise applied (default: 0)
+    Ex 'augmix-m5-w4-d2' results in AugMix with severity 5, chain width 4, chain depth 2
+
+    :param hparams: Other hparams (kwargs) for the Augmentation transforms
+
+    :return: A callable Transform Op
+    """
+    magnitude = 3
+    width = 3
+    depth = -1
+    alpha = 1.
+    blended = False
+    config = config_str.split('-')
+    assert config[0] == 'augmix'
+    config = config[1:]
+    for c in config:
+        cs = re.split(r'(\d.*)', c)
+        if len(cs) < 2:
+            continue
+        key, val = cs[:2]
+        if key == 'mstd':
+            # noise param injected via hparams for now
+            hparams.setdefault('magnitude_std', float(val))
+        elif key == 'm':
+            magnitude = int(val)
+        elif key == 'w':
+            width = int(val)
+        elif key == 'd':
+            depth = int(val)
+        elif key == 'a':
+            alpha = float(val)
+        elif key == 'b':
+            blended = bool(val)
+        else:
+            assert False, 'Unknown AugMix config section'
+    ops = augmix_ops(magnitude=magnitude, hparams=hparams)
+    return AugMixAugment(
+        ops, alpha=alpha, width=width, depth=depth, blended=blended)
+
+
+class RawTimmAutoAugment(object):
+    """TimmAutoAugment API for PaddleClas."""
+
+    def __init__(self,
+                 config_str="rand-m9-mstd0.5-inc1",
+                 interpolation="bicubic",
+                 img_size=224,
+                 mean=IMAGENET_DEFAULT_MEAN):
+        if isinstance(img_size, (tuple, list)):
+            img_size_min = min(img_size)
+        else:
+            img_size_min = img_size
+
+        aa_params = dict(
+            translate_const=int(img_size_min * 0.45),
+            img_mean=tuple([min(255, round(255 * x)) for x in mean]), )
+        if interpolation and interpolation != 'random':
+            aa_params['interpolation'] = _pil_interp(interpolation)
+        if config_str.startswith('rand'):
+            self.augment_func = rand_augment_transform(config_str, aa_params)
+        elif config_str.startswith('augmix'):
+            aa_params['translate_pct'] = 0.3
+            self.augment_func = augment_and_mix_transform(config_str,
+                                                          aa_params)
+        elif config_str.startswith('auto'):
+            self.augment_func = auto_augment_transform(config_str, aa_params)
+        else:
+            raise Exception(
+                "ConfigError: The TimmAutoAugment Op only support RandAugment, AutoAugment, AugMix, and the config_str only starts with \"rand\", \"augmix\", \"auto\"."
+            )
+
+    def __call__(self, img):
+        return self.augment_func(img)
--- a/ppcls/engine/engine.py
+++ b/ppcls/engine/engine.py
@ -200,7 +200,7 @@ class Engine(object):
        if self.mode == 'train':
            self.optimizer, self.lr_sch = build_optimizer(
                self.config["Optimizer"], self.config["Global"]["epochs"],
-                len(self.train_dataloader), self.model.parameters())
+                len(self.train_dataloader), [self.model])

        # for distributed
        self.config["Global"][
@ -355,7 +355,8 @@ class Engine(object):

    def export(self):
        assert self.mode == "export"
-        model = ExportModel(self.config["Arch"], self.model)
+        use_multilabel = self.config["Global"].get("use_multilabel", False)
+        model = ExportModel(self.config["Arch"], self.model, use_multilabel)
        if self.config["Global"]["pretrained_model"] is not None:
            load_dygraph_pretrain(model.base_model,
                                  self.config["Global"]["pretrained_model"])
@ -388,10 +389,9 @@ class ExportModel(nn.Layer):
    ExportModel: add softmax onto the model
    """

-    def __init__(self, config, model):
+    def __init__(self, config, model, use_multilabel):
        super().__init__()
        self.base_model = model
-
        # we should choose a final model to export
        if isinstance(self.base_model, DistillationModel):
            self.infer_model_name = config["infer_model_name"]
@ -402,10 +402,13 @@ class ExportModel(nn.Layer):
        if self.infer_output_key == "features" and isinstance(self.base_model,
                                                              RecModel):
            self.base_model.head = IdentityHead()
-        if config.get("infer_add_softmax", True):
-            self.softmax = nn.Softmax(axis=-1)
+        if use_multilabel:
+            self.out_act = nn.Sigmoid()
        else:
-            self.softmax = None
+            if config.get("infer_add_softmax", True):
+                self.out_act = nn.Softmax(axis=-1)
+            else:
+                self.out_act = None

    def eval(self):
        self.training = False
@ -421,6 +424,6 @@ class ExportModel(nn.Layer):
            x = x[self.infer_model_name]
        if self.infer_output_key is not None:
            x = x[self.infer_output_key]
-        if self.softmax is not None:
-            x = self.softmax(x)
+        if self.out_act is not None:
+            x = self.out_act(x)
        return x
--- a/Show More
+++ b/Show More