diff --git a/__init__.py b/__init__.py index b8b43616a..2128a6cc7 100644 --- a/__init__.py +++ b/__init__.py @@ -14,3 +14,4 @@ __all__ = ['PaddleClas'] from .paddleclas import PaddleClas +from ppcls.arch.backbone import * diff --git a/docs/en/algorithm_introduction/knowledge_distillation.md b/docs/en/algorithm_introduction/knowledge_distillation.md new file mode 100644 index 000000000..f528569b5 --- /dev/null +++ b/docs/en/algorithm_introduction/knowledge_distillation.md @@ -0,0 +1,94 @@ +# Knowledge Distillation + +--- +## Content + +* [1. Introduction of model compression methods](#1) +* [2. Application of knowledge distillation](#2) +* [3. Overview of knowledge distillation methods](#3) + * [3.1 Response based distillation](#3.1) + * [3.2 Feature based distillation](#3.2) + * [3.3 Relation based distillation](#3.3) +* [4. Reference](#4) + + + +## 1. Introduction of model compression methods + +In recent years, deep neural networks have been proved to be an extremely effective method for solving problems in the fields of computer vision and natural language processing. A suitable neural network architecture might performs better than traditional algorithms mostly. + +When the amount of data is large enough, increasing the model parameters with a reasonable method can significantly improve the model performance, but this brings about the problem of a sharp increase of the model complexity. It costs more for larger models. + +Parameter redundancy exists in deep neural networks generally. At present, there are several mainstream methods to compress the model and reduce parameters. Such as pruning, quantization, knowledge distillation, etc. Knowledge distillation refers to the use of a teacher model to guide the student model to learn specific tasks to ensure that the small model obtains relatively large performance, and even has comparable performance with the large model [1]. + + +Currently, knowledge distillation methods can be roughly divided into the following three types. + +* Response based distillation: Output of student model is guided by the teacher model for +* Feature based distillation: Inner feature map of student model is guided by the teacher model. +* Relation based distillation: For different samples, the teacher model and the student model are used to calculate the correlation of the feature map between the samples, the final goal is to make sure that correlation matrix of student model and the teacher model are as consistent as possible. + + + + +## 2. Application of knowledge distillation + +Knowledge distillation algorithm is widely used in lightweight tasks. For tasks that need to meet specific accuracy, by using the knowledge distillation method, we can achieve the required accuracy with a smaller model, thereby reducing model deployment cost. + + +What's more, for the same model structure, pre-trained models obtained by knowledge distillation often performs better, and these pre-trained models can also improve performance of the downstream tasks. For example, a pre-trained image classification model with higher accuracy can also help other tasks obtain significant accuracy gains such as target detection, image segmentation, OCR, and video classification. + + + +## 3. Overview of knowledge distillation methods + + + +### 3.1 Response based distillation + + +Knowledge distillation algorithm is firstly proposed by Hinton, which is called KD. In addition to base cross entropy loss, KL divergence loss between output of student model and teacher model is also added into the total training loss. It's noted that a larger teacher model is needed to guide the training process of the student model. + +PaddleClas proposed a simple but useful knowledge distillation algorithm canlled SSLD [6], Labels are not needed for SSLD, so unlabeled data can also be used for training. Accuracy of 15 models has more than 3% improvement using SSLD. + +Teacher model is needed for the above-mentioned distillation method to guide the student model training process. Deep Mutual Learning (DML) is then proposed [7], for which two models with same architecture learn from each other to obtain higher accuracy. Compared with KD and other knowledge distillation algorithms that rely on large teacher models, DML is free of dependence on large teacher models. The distillation training process is simpler. + + + +### 3.2 Feature based distillation + +Heo et al. proposed OverHaul [8], which calculates the feature map distance between the student model and the teacher model, as distillation loss. Here, feature map alignment of the student model and the teacher model is used to ensure that the feature maps' distance can be calculated. + +Feature based distillation can also be integrated with the response based knowledge distillation algorithm in Chapter 3.1, which means both the inner feature map and output of the student model are guided during the training process. For the DML method, this integration process is simpler, because the alignment process is not needed since the two models' architectures are absolutely same. This integration process is used in the PP-OCRv2 system, which ultimately greatly improves the accuracy of the OCR text recognition model. + + + +### 3.3 Relation based distillation + +The papers in chapters `3.1` and `3.2` mainly consider the inner feature map or final output of the student model and the teacher model. These knowledge distillation algorithms only focus on the output for single sample, but do not consider the output relationship between different samples. + +Park et al. proposed RKD [10], a relationship-based knowledge distillation algorithm. In RKD, the relationship between different samples is further considered, and two loss functions are used, which are the second-order distance loss (distance-wise) and the third-order angle loss (angle-wise). For the final distillation loss, KD loss and RKD loss are considered at the same time. The final accuracy is better than the accuracy of the model obtained just using KD loss. + + + +## 4. Reference + +[1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015. + +[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018. + +[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019. + +[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123. + +[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260. + +[6] Cui C, Guo R, Du Y, et al. Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones[J]. arXiv preprint arXiv:2103.05959, 2021. + +[7] Zhang Y, Xiang T, Hospedales T M, et al. Deep mutual learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4320-4328. + +[8] Heo B, Kim J, Yun S, et al. A comprehensive overhaul of feature distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1921-1930. + +[9] Du Y, Li C, Guo R, et al. PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System[J]. arXiv preprint arXiv:2109.03144, 2021. + +[10] Park W, Kim D, Lu Y, et al. Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3967-3976. diff --git a/docs/en/introduction/more_demo/cartoon.md b/docs/en/introduction/more_demo/cartoon.md new file mode 100644 index 000000000..042fbf2e1 --- /dev/null +++ b/docs/en/introduction/more_demo/cartoon.md @@ -0,0 +1,53 @@ +# Cartoon Demo Images + +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
diff --git a/docs/en/introduction/more_demo/logo.md b/docs/en/introduction/more_demo/logo.md new file mode 100644 index 000000000..0b8ee1b64 --- /dev/null +++ b/docs/en/introduction/more_demo/logo.md @@ -0,0 +1,65 @@ +# Logo Demo Images + +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
diff --git a/docs/en/introduction/more_demo/more_demo.md b/docs/en/introduction/more_demo/more_demo.md new file mode 100644 index 000000000..1e8d00d01 --- /dev/null +++ b/docs/en/introduction/more_demo/more_demo.md @@ -0,0 +1,34 @@ +## Demo images +- Product recognition +
+
+
+
+
+ +[More demo images](product.md) + +- Cartoon character recognition +
+
+
+ +[More demo images](cartoon.md) + +- Logo recognition +
+
+
+
+ +
+
+ +[More demo images](logo.md) + +- Car recognition +
+
+
+ +[More demo images](vehicle.md) diff --git a/docs/en/introduction/more_demo/product.md b/docs/en/introduction/more_demo/product.md new file mode 100644 index 000000000..f80203fe4 --- /dev/null +++ b/docs/en/introduction/more_demo/product.md @@ -0,0 +1,179 @@ +# Product Demo Images + +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
diff --git a/docs/en/introduction/more_demo/vehicle.md b/docs/en/introduction/more_demo/vehicle.md new file mode 100644 index 000000000..2f37e3973 --- /dev/null +++ b/docs/en/introduction/more_demo/vehicle.md @@ -0,0 +1,33 @@ +# Vehicle Demo Images + +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
diff --git a/docs/en/more_demo.md b/docs/en/more_demo.md deleted file mode 100644 index 575ddc98a..000000000 --- a/docs/en/more_demo.md +++ /dev/null @@ -1,61 +0,0 @@ -## Demo images -- Product recognition -
- -
-
- -
-
- -
-
- -
-
- -
-
- -
-
- -
-
- -
- -- Cartoon character recognition -
- -
-
- -
-
- -
- -- Logo recognition -
- -
-
- -
-
- -
- -- Car recognition -
- -
-
- -
-
- -
- -[More demo images](../images/recognition/more_demo_images) diff --git a/docs/images/recognition/more_demo_images/cartoon.md b/docs/zh_CN/introduction/more_demo/cartoon.md similarity index 100% rename from docs/images/recognition/more_demo_images/cartoon.md rename to docs/zh_CN/introduction/more_demo/cartoon.md diff --git a/docs/images/recognition/more_demo_images/logo.md b/docs/zh_CN/introduction/more_demo/logo.md similarity index 100% rename from docs/images/recognition/more_demo_images/logo.md rename to docs/zh_CN/introduction/more_demo/logo.md diff --git a/docs/zh_CN/introduction/more_demo.md b/docs/zh_CN/introduction/more_demo/more_demo.md similarity index 92% rename from docs/zh_CN/introduction/more_demo.md rename to docs/zh_CN/introduction/more_demo/more_demo.md index f12d44be0..3ffc58ef9 100644 --- a/docs/zh_CN/introduction/more_demo.md +++ b/docs/zh_CN/introduction/more_demo/more_demo.md @@ -6,7 +6,7 @@
-[更多效果图](../../images/recognition/more_demo_images/product.md) +[更多效果图](product.md) - 动漫人物识别 @@ -15,7 +15,7 @@
-[更多效果图](../../images/recognition/more_demo_images/cartoon.md) +[更多效果图](cartoon.md) - logo识别 @@ -24,16 +24,10 @@
-
- - - - - [更多效果图](../../images/recognition/more_demo_images/logo.md) @@ -43,4 +37,4 @@
-[更多效果图](../../images/recognition/more_demo_images/vehicle.md) +[更多效果图](vehicle.md) diff --git a/docs/images/recognition/more_demo_images/product.md b/docs/zh_CN/introduction/more_demo/product.md similarity index 100% rename from docs/images/recognition/more_demo_images/product.md rename to docs/zh_CN/introduction/more_demo/product.md diff --git a/docs/images/recognition/more_demo_images/vehicle.md b/docs/zh_CN/introduction/more_demo/vehicle.md similarity index 100% rename from docs/images/recognition/more_demo_images/vehicle.md rename to docs/zh_CN/introduction/more_demo/vehicle.md diff --git a/docs/zh_CN/others/more_demo.md b/docs/zh_CN/others/more_demo.md deleted file mode 100644 index 14ed1a3dd..000000000 --- a/docs/zh_CN/others/more_demo.md +++ /dev/null @@ -1,61 +0,0 @@ -## 识别效果展示 -- 商品识别 -
- -
-
- -
-
- -
-
- -
-
- -
-
- -
-
- -
-
- -
- -- 动漫人物识别 -
- -
-
- -
-
- -
- -- logo 识别 -
- -
-
- -
-
- -
- -- 车辆识别 -
- -
-
- -
-
- -
- -[更多效果图](../../images/recognition/more_demo_images) diff --git a/paddleclas.py b/paddleclas.py index 91cd030ab..bfad1931b 100644 --- a/paddleclas.py +++ b/paddleclas.py @@ -38,6 +38,10 @@ from deploy.utils.get_image_list import get_image_list from deploy.utils import config from ppcls.arch.backbone import * +from ppcls.utils.logger import init_logger + +# for building model with loading pretrained weights from backbone +init_logger() __all__ = ["PaddleClas"] @@ -58,20 +62,27 @@ MODEL_SERIES = { "DenseNet121", "DenseNet161", "DenseNet169", "DenseNet201", "DenseNet264" ], + "DLA": [ + "DLA46_c", "DLA60x_c", "DLA34", "DLA60", "DLA60x", "DLA102", "DLA102x", + "DLA102x2", "DLA169" + ], "DPN": ["DPN68", "DPN92", "DPN98", "DPN107", "DPN131"], "EfficientNet": [ "EfficientNetB0", "EfficientNetB0_small", "EfficientNetB1", "EfficientNetB2", "EfficientNetB3", "EfficientNetB4", "EfficientNetB5", "EfficientNetB6", "EfficientNetB7" ], + "ESNet": ["ESNet_x0_25", "ESNet_x0_5", "ESNet_x0_75", "ESNet_x1_0"], "GhostNet": ["GhostNet_x0_5", "GhostNet_x1_0", "GhostNet_x1_3", "GhostNet_x1_3_ssld"], + "HarDNet": ["HarDNet39_ds", "HarDNet68_ds", "HarDNet68", "HarDNet85"], "HRNet": [ "HRNet_W18_C", "HRNet_W30_C", "HRNet_W32_C", "HRNet_W40_C", "HRNet_W44_C", "HRNet_W48_C", "HRNet_W64_C", "HRNet_W18_C_ssld", "HRNet_W48_C_ssld" ], "Inception": ["GoogLeNet", "InceptionV3", "InceptionV4"], + "MixNet": ["MixNet_S", "MixNet_M", "MixNet_L"], "MobileNetV1": [ "MobileNetV1_x0_25", "MobileNetV1_x0_5", "MobileNetV1_x0_75", "MobileNetV1", "MobileNetV1_ssld" @@ -89,6 +100,11 @@ MODEL_SERIES = { "MobileNetV3_large_x1_0", "MobileNetV3_large_x1_25", "MobileNetV3_small_x1_0_ssld", "MobileNetV3_large_x1_0_ssld" ], + "PPLCNet": [ + "PPLCNet_x0_25", "PPLCNet_x0_35", "PPLCNet_x0_5", "PPLCNet_x0_75", + "PPLCNet_x1_0", "PPLCNet_x1_5", "PPLCNet_x2_0", "PPLCNet_x2_5" + ], + "RedNet": ["RedNet26", "RedNet38", "RedNet50", "RedNet101", "RedNet152"], "RegNet": ["RegNetX_4GF"], "Res2Net": [ "Res2Net50_14w_8s", "Res2Net50_26w_4s", "Res2Net50_vd_26w_4s", @@ -113,6 +129,8 @@ MODEL_SERIES = { "ResNeXt152_32x4d", "ResNeXt152_vd_32x4d", "ResNeXt152_64x4d", "ResNeXt152_vd_64x4d" ], + "ReXNet": + ["ReXNet_1_0", "ReXNet_1_3", "ReXNet_1_5", "ReXNet_2_0", "ReXNet_3_0"], "SENet": [ "SENet154_vd", "SE_HRNet_W64_C_ssld", "SE_ResNet18_vd", "SE_ResNet34_vd", "SE_ResNet50_vd", "SE_ResNeXt50_32x4d", @@ -134,6 +152,10 @@ MODEL_SERIES = { "SwinTransformer_small_patch4_window7_224", "SwinTransformer_tiny_patch4_window7_224" ], + "Twins": [ + "pcpvt_small", "pcpvt_base", "pcpvt_large", "alt_gvt_small", + "alt_gvt_base", "alt_gvt_large" + ], "VGG": ["VGG11", "VGG13", "VGG16", "VGG19"], "VisionTransformer": [ "ViT_base_patch16_224", "ViT_base_patch16_384", "ViT_base_patch32_384", @@ -465,24 +487,23 @@ class PaddleClas(object): """Predict input_data. Args: - input_data (Union[str, np.array]): + input_data (Union[str, np.array]): When the type is str, it is the path of image, or the directory containing images, or the URL of image from Internet. When the type is np.array, it is the image data whose channel order is RGB. - print_pred (bool, optional): Whether print the prediction result. Defaults to False. Defaults to False. + print_pred (bool, optional): Whether print the prediction result. Defaults to False. Raises: ImageTypeError: Illegal input_data. Yields: - Generator[list, None, None]: - The prediction result(s) of input_data by batch_size. For every one image, - prediction result(s) is zipped as a dict, that includs topk "class_ids", "scores" and "label_names". - The format is as follow: [{"class_ids": [...], "scores": [...], "label_names": [...]}, ...] + Generator[list, None, None]: + The prediction result(s) of input_data by batch_size. For every one image, + prediction result(s) is zipped as a dict, that includs topk "class_ids", "scores" and "label_names". + The format of batch prediction result(s) is as follow: [{"class_ids": [...], "scores": [...], "label_names": [...]}, ...] """ if isinstance(input_data, np.ndarray): - outputs = self.cls_predictor.predict(input_data) - yield self.cls_predictor.postprocess(outputs) + yield self.cls_predictor.predict(input_data) elif isinstance(input_data, str): if input_data.startswith("http") or input_data.startswith("https"): image_storage_dir = partial(os.path.join, BASE_IMAGES_DIR) @@ -497,7 +518,7 @@ class PaddleClas(object): image_list = get_image_list(input_data) batch_size = self._config.Global.get("batch_size", 1) - topk = self._config.PostProcess.get('topk', 1) + topk = self._config.PostProcess.Topk.get('topk', 1) img_list = [] img_path_list = [] @@ -515,16 +536,15 @@ class PaddleClas(object): cnt += 1 if cnt % batch_size == 0 or (idx + 1) == len(image_list): - outputs = self.cls_predictor.predict(img_list) - preds = self.cls_predictor.postprocess(outputs, - img_path_list) + preds = self.cls_predictor.predict(img_list) + if print_pred and preds: - for pred in preds: - filename = pred.pop("file_name") + for idx, pred in enumerate(preds): pred_str = ", ".join( [f"{k}: {pred[k]}" for k in pred]) print( - f"filename: {filename}, top-{topk}, {pred_str}") + f"filename: {img_path_list[idx]}, top-{topk}, {pred_str}" + ) img_list = [] img_path_list = [] diff --git a/ppcls/arch/backbone/__init__.py b/ppcls/arch/backbone/__init__.py index 1764830dc..a2987a2f2 100644 --- a/ppcls/arch/backbone/__init__.py +++ b/ppcls/arch/backbone/__init__.py @@ -48,7 +48,7 @@ from ppcls.arch.backbone.model_zoo.resnext101_wsl import ResNeXt101_32x8d_wsl, R from ppcls.arch.backbone.model_zoo.squeezenet import SqueezeNet1_0, SqueezeNet1_1 from ppcls.arch.backbone.model_zoo.darknet import DarkNet53 from ppcls.arch.backbone.model_zoo.regnet import RegNetX_200MF, RegNetX_4GF, RegNetX_32GF, RegNetY_200MF, RegNetY_4GF, RegNetY_32GF -from ppcls.arch.backbone.model_zoo.vision_transformer import ViT_small_patch16_224, ViT_base_patch16_224, ViT_base_patch16_384, ViT_base_patch32_384, ViT_large_patch16_224, ViT_large_patch16_384, ViT_large_patch32_384, ViT_huge_patch16_224, ViT_huge_patch32_384 +from ppcls.arch.backbone.model_zoo.vision_transformer import ViT_small_patch16_224, ViT_base_patch16_224, ViT_base_patch16_384, ViT_base_patch32_384, ViT_large_patch16_224, ViT_large_patch16_384, ViT_large_patch32_384 from ppcls.arch.backbone.model_zoo.distilled_vision_transformer import DeiT_tiny_patch16_224, DeiT_small_patch16_224, DeiT_base_patch16_224, DeiT_tiny_distilled_patch16_224, DeiT_small_distilled_patch16_224, DeiT_base_distilled_patch16_224, DeiT_base_patch16_384, DeiT_base_distilled_patch16_384 from ppcls.arch.backbone.model_zoo.swin_transformer import SwinTransformer_tiny_patch4_window7_224, SwinTransformer_small_patch4_window7_224, SwinTransformer_base_patch4_window7_224, SwinTransformer_base_patch4_window12_384, SwinTransformer_large_patch4_window7_224, SwinTransformer_large_patch4_window12_384 from ppcls.arch.backbone.model_zoo.mixnet import MixNet_S, MixNet_M, MixNet_L @@ -65,6 +65,7 @@ from ppcls.arch.backbone.variant_models.vgg_variant import VGG19Sigmoid from ppcls.arch.backbone.variant_models.pp_lcnet_variant import PPLCNet_x2_5_Tanh +# help whl get all the models' api (class type) and components' api (func type) def get_apis(): current_func = sys._getframe().f_code.co_name current_module = sys.modules[__name__] diff --git a/ppcls/arch/backbone/model_zoo/vision_transformer.py b/ppcls/arch/backbone/model_zoo/vision_transformer.py index 5aaf8cc35..c71c0262f 100644 --- a/ppcls/arch/backbone/model_zoo/vision_transformer.py +++ b/ppcls/arch/backbone/model_zoo/vision_transformer.py @@ -38,10 +38,6 @@ MODEL_URLS = { "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams", "ViT_large_patch32_384": "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams", - "ViT_huge_patch16_224": - "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_huge_patch16_224_pretrained.pdparams", - "ViT_huge_patch32_384": - "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_huge_patch32_384_pretrained.pdparams" } __all__ = list(MODEL_URLS.keys()) @@ -460,36 +456,3 @@ def ViT_large_patch32_384(pretrained=False, use_ssld=False, **kwargs): MODEL_URLS["ViT_large_patch32_384"], use_ssld=use_ssld) return model - - -def ViT_huge_patch16_224(pretrained=False, use_ssld=False, **kwargs): - model = VisionTransformer( - patch_size=16, - embed_dim=1280, - depth=32, - num_heads=16, - mlp_ratio=4, - **kwargs) - _load_pretrained( - pretrained, - model, - MODEL_URLS["ViT_huge_patch16_224"], - use_ssld=use_ssld) - return model - - -def ViT_huge_patch32_384(pretrained=False, use_ssld=False, **kwargs): - model = VisionTransformer( - img_size=384, - patch_size=32, - embed_dim=1280, - depth=32, - num_heads=16, - mlp_ratio=4, - **kwargs) - _load_pretrained( - pretrained, - model, - MODEL_URLS["ViT_huge_patch32_384"], - use_ssld=use_ssld) - return model diff --git a/ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch16_224.yaml b/ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch16_224.yaml deleted file mode 100644 index 7ffe85972..000000000 --- a/ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch16_224.yaml +++ /dev/null @@ -1,130 +0,0 @@ -# global configs -Global: - checkpoints: null - pretrained_model: null - output_dir: ./output/ - device: gpu - save_interval: 1 - eval_during_train: True - eval_interval: 1 - epochs: 120 - print_batch_step: 10 - use_visualdl: False - # used for static mode and model export - image_shape: [3, 224, 224] - save_inference_dir: ./inference - -# model architecture -Arch: - name: ViT_huge_patch16_224 - class_num: 1000 - -# loss function config for traing/eval process -Loss: - Train: - - CELoss: - weight: 1.0 - Eval: - - CELoss: - weight: 1.0 - - -Optimizer: - name: Momentum - momentum: 0.9 - lr: - name: Piecewise - learning_rate: 0.1 - decay_epochs: [30, 60, 90] - values: [0.1, 0.01, 0.001, 0.0001] - regularizer: - name: 'L2' - coeff: 0.0001 - - -# data loader for train and eval -DataLoader: - Train: - dataset: - name: ImageNetDataset - image_root: ./dataset/ILSVRC2012/ - cls_label_path: ./dataset/ILSVRC2012/train_list.txt - transform_ops: - - DecodeImage: - to_rgb: True - channel_first: False - - RandCropImage: - size: 224 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.5, 0.5, 0.5] - std: [0.5, 0.5, 0.5] - order: '' - - sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True - loader: - num_workers: 4 - use_shared_memory: True - - Eval: - dataset: - name: ImageNetDataset - image_root: ./dataset/ILSVRC2012/ - cls_label_path: ./dataset/ILSVRC2012/val_list.txt - transform_ops: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.5, 0.5, 0.5] - std: [0.5, 0.5, 0.5] - order: '' - sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False - loader: - num_workers: 4 - use_shared_memory: True - -Infer: - infer_imgs: docs/images/whl/demo.jpg - batch_size: 10 - transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 256 - - CropImage: - size: 224 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.5, 0.5, 0.5] - std: [0.5, 0.5, 0.5] - order: '' - - ToCHWImage: - PostProcess: - name: Topk - topk: 5 - class_id_map_file: ppcls/utils/imagenet1k_label_list.txt - -Metric: - Train: - - TopkAcc: - topk: [1, 5] - Eval: - - TopkAcc: - topk: [1, 5] diff --git a/ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch32_384.yaml b/ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch32_384.yaml deleted file mode 100644 index 14d892e34..000000000 --- a/ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch32_384.yaml +++ /dev/null @@ -1,130 +0,0 @@ -# global configs -Global: - checkpoints: null - pretrained_model: null - output_dir: ./output/ - device: gpu - save_interval: 1 - eval_during_train: True - eval_interval: 1 - epochs: 120 - print_batch_step: 10 - use_visualdl: False - # used for static mode and model export - image_shape: [3, 384, 384] - save_inference_dir: ./inference - -# model architecture -Arch: - name: ViT_huge_patch32_384 - class_num: 1000 - -# loss function config for traing/eval process -Loss: - Train: - - CELoss: - weight: 1.0 - Eval: - - CELoss: - weight: 1.0 - - -Optimizer: - name: Momentum - momentum: 0.9 - lr: - name: Piecewise - learning_rate: 0.1 - decay_epochs: [30, 60, 90] - values: [0.1, 0.01, 0.001, 0.0001] - regularizer: - name: 'L2' - coeff: 0.0001 - - -# data loader for train and eval -DataLoader: - Train: - dataset: - name: ImageNetDataset - image_root: ./dataset/ILSVRC2012/ - cls_label_path: ./dataset/ILSVRC2012/train_list.txt - transform_ops: - - DecodeImage: - to_rgb: True - channel_first: False - - RandCropImage: - size: 384 - - RandFlipImage: - flip_code: 1 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.5, 0.5, 0.5] - std: [0.5, 0.5, 0.5] - order: '' - - sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: True - loader: - num_workers: 4 - use_shared_memory: True - - Eval: - dataset: - name: ImageNetDataset - image_root: ./dataset/ILSVRC2012/ - cls_label_path: ./dataset/ILSVRC2012/val_list.txt - transform_ops: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 384 - - CropImage: - size: 384 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.5, 0.5, 0.5] - std: [0.5, 0.5, 0.5] - order: '' - sampler: - name: DistributedBatchSampler - batch_size: 64 - drop_last: False - shuffle: False - loader: - num_workers: 4 - use_shared_memory: True - -Infer: - infer_imgs: docs/images/whl/demo.jpg - batch_size: 10 - transforms: - - DecodeImage: - to_rgb: True - channel_first: False - - ResizeImage: - resize_short: 384 - - CropImage: - size: 384 - - NormalizeImage: - scale: 1.0/255.0 - mean: [0.5, 0.5, 0.5] - std: [0.5, 0.5, 0.5] - order: '' - - ToCHWImage: - PostProcess: - name: Topk - topk: 5 - class_id_map_file: ppcls/utils/imagenet1k_label_list.txt - -Metric: - Train: - - TopkAcc: - topk: [1, 5] - Eval: - - TopkAcc: - topk: [1, 5] diff --git a/test_tipc/config/VisionTransformer/ViT_huge_patch16_224_train_infer_python.txt b/test_tipc/config/VisionTransformer/ViT_huge_patch16_224_train_infer_python.txt deleted file mode 100644 index 8b83f4d59..000000000 --- a/test_tipc/config/VisionTransformer/ViT_huge_patch16_224_train_infer_python.txt +++ /dev/null @@ -1,52 +0,0 @@ -===========================train_params=========================== -model_name:ViT_huge_patch16_224 -python:python3.7 -gpu_list:0|0,1 --o Global.device:gpu --o Global.auto_cast:null --o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120 --o Global.output_dir:./output/ --o DataLoader.Train.sampler.batch_size:8 --o Global.pretrained_model:null -train_model_name:latest -train_infer_img_dir:./dataset/ILSVRC2012/val -null:null -## -trainer:norm_train -norm_train:tools/train.py -c ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch16_224.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -pact_train:null -fpgm_train:null -distill_train:null -null:null -null:null -## -===========================eval_params=========================== -eval:tools/eval.py -c ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch16_224.yaml -null:null -## -===========================infer_params========================== --o Global.save_inference_dir:./inference --o Global.pretrained_model: -norm_export:tools/export_model.py -c ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch16_224.yaml -quant_export:null -fpgm_export:null -distill_export:null -kl_quant:null -export2:null -pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_huge_patch16_224_pretrained.pdparams -infer_model:../inference/ -infer_export:True -infer_quant:Fasle -inference:python/predict_cls.py -c configs/inference_cls.yaml --o Global.use_gpu:True|False --o Global.enable_mkldnn:True|False --o Global.cpu_num_threads:1|6 --o Global.batch_size:1|16 --o Global.use_tensorrt:True|False --o Global.use_fp16:True|False --o Global.inference_model_dir:../inference --o Global.infer_imgs:../dataset/ILSVRC2012/val --o Global.save_log_path:null --o Global.benchmark:True -null:null -null:null diff --git a/test_tipc/config/VisionTransformer/ViT_huge_patch32_384_train_infer_python.txt b/test_tipc/config/VisionTransformer/ViT_huge_patch32_384_train_infer_python.txt deleted file mode 100644 index 8aaf12745..000000000 --- a/test_tipc/config/VisionTransformer/ViT_huge_patch32_384_train_infer_python.txt +++ /dev/null @@ -1,52 +0,0 @@ -===========================train_params=========================== -model_name:ViT_huge_patch32_384 -python:python3.7 -gpu_list:0|0,1 --o Global.device:gpu --o Global.auto_cast:null --o Global.epochs:lite_train_lite_infer=2|whole_train_whole_infer=120 --o Global.output_dir:./output/ --o DataLoader.Train.sampler.batch_size:2 --o Global.pretrained_model:null -train_model_name:latest -train_infer_img_dir:./dataset/ILSVRC2012/val -null:null -## -trainer:norm_train -norm_train:tools/train.py -c ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch32_384.yaml -o Global.seed=1234 -o DataLoader.Train.sampler.shuffle=False -o DataLoader.Train.loader.num_workers=0 -o DataLoader.Train.loader.use_shared_memory=False -pact_train:null -fpgm_train:null -distill_train:null -null:null -null:null -## -===========================eval_params=========================== -eval:tools/eval.py -c ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch32_384.yaml -null:null -## -===========================infer_params========================== --o Global.save_inference_dir:./inference --o Global.pretrained_model: -norm_export:tools/export_model.py -c ppcls/configs/ImageNet/VisionTransformer/ViT_huge_patch32_384.yaml -quant_export:null -fpgm_export:null -distill_export:null -kl_quant:null -export2:null -pretrained_model_url:https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_huge_patch32_384_pretrained.pdparams -infer_model:../inference/ -infer_export:True -infer_quant:Fasle -inference:python/predict_cls.py -c configs/inference_cls.yaml -o PreProcess.transform_ops.0.ResizeImage.resize_short=384 -o PreProcess.transform_ops.1.CropImage.size=384 --o Global.use_gpu:True|False --o Global.enable_mkldnn:True|False --o Global.cpu_num_threads:1|6 --o Global.batch_size:1|16 --o Global.use_tensorrt:True|False --o Global.use_fp16:True|False --o Global.inference_model_dir:../inference --o Global.infer_imgs:../dataset/ILSVRC2012/val --o Global.save_log_path:null --o Global.benchmark:True -null:null -null:null diff --git a/test_tipc/test_train_inference_python.sh b/test_tipc/test_train_inference_python.sh index 0c09446e9..cecfa93e7 100644 --- a/test_tipc/test_train_inference_python.sh +++ b/test_tipc/test_train_inference_python.sh @@ -288,6 +288,7 @@ else # run train eval "unset CUDA_VISIBLE_DEVICES" export FLAGS_cudnn_deterministic=True + sleep 5 eval $cmd status_check $? "${cmd}" "${status_log}" sleep 5