mmpretrain/configs/clip/metafile.yml

309 lines
11 KiB
YAML
Raw Normal View History

Collections:
- Name: CLIP
Metadata:
Architecture:
- Attention Dropout
- Convolution
- Dense Connections
- Dropout
- GELU
- Layer Normalization
- Multi-Head Attention
- Scaled Dot-Product Attention
- Tanh Activation
Paper:
Title: Learning Transferable Visual Models From Natural Language Supervision
URL: https://arxiv.org/abs/2103.00020
README: configs/clip/README.md
Code:
URL: https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/models/backbones/vision_transformer.py
Version: v1.0.0
Models:
- Name: vit-base-p32_clip-openai-pre_3rdparty_in1k
Metadata:
FLOPs: 4364335104
Parameters: 88225000
Training Data:
- OpenAI
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 81.77
Top 5 Accuracy: 95.89
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_openai-pre_3rdparty_in1k_20221220-a0182ba9.pth
Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k
- Name: vit-base-p32_clip-laion2b-pre_3rdparty_in1k
Metadata:
FLOPs: 4364335104
Parameters: 88225000
Training Data:
- LAION-2B
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 82.46
Top 5 Accuracy: 96.12
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-pre_3rdparty_in1k_20221220-194df57f.pth
Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k
- Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k
Metadata:
FLOPs: 4364335104
Parameters: 88225000
Training Data:
- LAION-2B
- ImageNet-12k
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 83.06
Top 5 Accuracy: 96.49
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k_20221220-b384e830.pth
Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k
- Name: vit-base-p32_clip-openai-in12k-pre_3rdparty_in1k-384px
Metadata:
FLOPs: 12661054464
Parameters: 88225000
Training Data:
- OpenAI
- ImageNet-12k
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 85.13
Top 5 Accuracy: 97.42
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_openai-in12k-pre_3rdparty_in1k-384px_20221220-dc2e49ea.pth
Config: configs/clip/vit-base-p32_pt-64xb64_in1k-384px.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k
- Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k-384px
Metadata:
FLOPs: 12661054464
Parameters: 88225000
Training Data:
- LAION-2B
- ImageNet-12k
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 85.39
Top 5 Accuracy: 97.67
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k-384px_20221220-c7757552.pth
Config: configs/clip/vit-base-p32_pt-64xb64_in1k-384px.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k
- Name: vit-base-p16_clip-openai-pre_3rdparty_in1k
Metadata:
FLOPs: 16855600128
Parameters: 86568424
Training Data:
- OpenAI
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 85.3
Top 5 Accuracy: 97.5
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-pre_3rdparty_in1k_20221220-c7d9c899.pth
Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k
- Name: vit-base-p16_clip-laion2b-pre_3rdparty_in1k
Metadata:
FLOPs: 16855600128
Parameters: 86568424
Training Data:
- LAION-2B
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 85.49
Top 5 Accuracy: 97.59
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-pre_3rdparty_in1k_20221220-5e24ff58.pth
Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k
- Name: vit-base-p16_clip-openai-in12k-pre_3rdparty_in1k
Metadata:
FLOPs: 16855600128
Parameters: 86568424
Training Data:
- OpenAI
- ImageNet-12k
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 85.99
Top 5 Accuracy: 97.72
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-in12k-pre_3rdparty_in1k_20221220-90d930a8.pth
Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k
- Name: vit-base-p16_clip-laion2b-in12k-pre_3rdparty_in1k
Metadata:
FLOPs: 16855600128
Parameters: 86568424
Training Data:
- LAION-2B
- ImageNet-12k
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 86.02
Top 5 Accuracy: 97.76
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-in12k-pre_3rdparty_in1k_20221220-a5e31f8c.pth
Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k
- Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k-448px
Metadata:
FLOPs: 17202416640
Parameters: 88225000
Training Data:
- LAION-2B
- ImageNet-12k
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 85.76
Top 5 Accuracy: 97.63
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k-448px_20221220-ca404a7d.pth
Config: configs/clip/vit-base-p32_pt-64xb64_in1k-448px.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k
- Name: vit-base-p16_clip-openai-pre_3rdparty_in1k-384px
Metadata:
FLOPs: 49370078208
Parameters: 86568424
Training Data:
- OpenAI
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 86.25
Top 5 Accuracy: 97.9
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-pre_3rdparty_in1k-384px_20221220-eb012e87.pth
Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k
- Name: vit-base-p16_clip-laion2b-pre_3rdparty_in1k-384px
Metadata:
FLOPs: 49370078208
Parameters: 86568424
Training Data:
- LAION-2B
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 86.52
Top 5 Accuracy: 97.97
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-pre_3rdparty_in1k-384px_20221220-558ed826.pth
Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k
- Name: vit-base-p16_clip-openai-in12k-pre_3rdparty_in1k-384px
Metadata:
FLOPs: 49370078208
Parameters: 86568424
Training Data:
- OpenAI
- ImageNet-12k
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 86.87
Top 5 Accuracy: 98.05
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-in12k-pre_3rdparty_in1k-384px_20221220-8df86b74.pth
Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k
- Name: vit-base-p16_clip-laion2b-in12k-pre_3rdparty_in1k-384px
Metadata:
FLOPs: 49370078208
Parameters: 86568424
Training Data:
- LAION-2B
- ImageNet-12k
- ImageNet-1k
In Collection: CLIP
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 87.17
Top 5 Accuracy: 98.02
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-in12k-pre_3rdparty_in1k-384px_20221220-84ed0cc0.pth
Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py
Converted From:
Code: https://github.com/rwightman/pytorch-image-models
Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k
[Feature] Support multiple multi-modal algorithms and inferencers. (#1561) * [Feat] Migrate blip caption to mmpretrain. (#50) * Migrate blip caption to mmpretrain * minor fix * support train * [Feature] Support OFA caption task. (#51) * [Feature] Support OFA caption task. * Remove duplicated files. * [Feature] Support OFA vqa task. (#58) * [Feature] Support OFA vqa task. * Fix lint. * [Feat] Add BLIP retrieval to mmpretrain. (#55) * init * minor fix for train * fix according to comments * refactor * Update Blip retrieval. (#62) * [Feature] Support OFA visual grounding task. (#59) * [Feature] Support OFA visual grounding task. * minor add TODO --------- Co-authored-by: yingfhu <yingfhu@gmail.com> * [Feat] Add flamingos coco caption and vqa. (#60) * first init * init flamingo coco * add vqa * minor fix * remove unnecessary modules * Update config * Use `ApplyToList`. --------- Co-authored-by: mzr1996 <mzr1996@163.com> * [Feature]: BLIP2 coco retrieval (#53) * [Feature]: Add blip2 retriever * [Feature]: Add blip2 all modules * [Feature]: Refine model * [Feature]: x1 * [Feature]: Runnable coco ret * [Feature]: Runnable version * [Feature]: Fix lint * [Fix]: Fix lint * [Feature]: Use 364 img size * [Feature]: Refactor blip2 * [Fix]: Fix lint * refactor files * minor fix * minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com> * Remove * fix blip caption inputs (#68) * [Feat] Add BLIP NLVR support. (#67) * first init * init flamingo coco * add vqa * add nlvr * refactor nlvr * minor fix * minor fix * Update dataset --------- Co-authored-by: mzr1996 <mzr1996@163.com> * [Feature]: BLIP2 Caption (#70) * [Feature]: Add language model * [Feature]: blip2 caption forward * [Feature]: Reproduce the results * [Feature]: Refactor caption * refine config --------- Co-authored-by: yingfhu <yingfhu@gmail.com> * [Feat] Migrate BLIP VQA to mmpretrain (#69) * reformat * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * change * refactor code --------- Co-authored-by: yingfhu <yingfhu@gmail.com> * Update RefCOCO dataset * [Fix] fix lint * [Feature] Implement inference APIs for multi-modal tasks. (#65) * [Feature] Implement inference APIs for multi-modal tasks. * [Project] Add gradio demo. * [Improve] Update requirements * Update flamingo * Update blip * Add NLVR inferencer * Update flamingo * Update hugging face model register * Update ofa vqa * Update BLIP-vqa (#71) * Update blip-vqa docstring (#72) * Refine flamingo docstring (#73) * [Feature]: BLIP2 VQA (#61) * [Feature]: VQA forward * [Feature]: Reproduce accuracy * [Fix]: Fix lint * [Fix]: Add blank line * minor fix --------- Co-authored-by: yingfhu <yingfhu@gmail.com> * [Feature]: BLIP2 docstring (#74) * [Feature]: Add caption docstring * [Feature]: Add docstring to blip2 vqa * [Feature]: Add docstring to retrieval * Update BLIP-2 metafile and README (#75) * [Feature]: Add readme and docstring * Update blip2 results --------- Co-authored-by: mzr1996 <mzr1996@163.com> * [Feature] BLIP Visual Grounding on MMPretrain Branch (#66) * blip grounding merge with mmpretrain * remove commit * blip grounding test and inference api * refcoco dataset * refcoco dataset refine config * rebasing * gitignore * rebasing * minor edit * minor edit * Update blip-vqa docstring (#72) * rebasing * Revert "minor edit" This reverts commit 639cec757c215e654625ed0979319e60f0be9044. * blip grounding final * precommit * refine config * refine config * Update blip visual grounding --------- Co-authored-by: Yiqin Wang 王逸钦 <wyq1217@outlook.com> Co-authored-by: mzr1996 <mzr1996@163.com> * Update visual grounding metric * Update OFA docstring, README and metafiles. (#76) * [Docs] Update installation docs and gradio demo docs. (#77) * Update OFA name * Update Visual Grounding Visualizer * Integrate accelerate support * Fix imports. * Fix timm backbone * Update imports * Update README * Update circle ci * Update flamingo config * Add gradio demo README * [Feature]: Add scienceqa (#1571) * [Feature]: Add scienceqa * [Feature]: Change param name * Update docs * Update video --------- Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com> Co-authored-by: yingfhu <yingfhu@gmail.com> Co-authored-by: Yuan Liu <30762564+YuanLiuuuuuu@users.noreply.github.com> Co-authored-by: Yiqin Wang 王逸钦 <wyq1217@outlook.com> Co-authored-by: Rongjie Li <limo97@163.com>
2023-05-19 16:50:04 +08:00
- Name: vit-large-p14_clip-openai-pre_3rdparty
Metadata:
FLOPs: 59696580608
Parameters: 303302656
Training Data:
- OpenAI
In Collection: CLIP
Weights: https://download.openmmlab.com/mmclassification/v0/clip/vit-large-p14_clip-openai-pre_3rdparty_20230517-95e2af0b.pth
Config: configs/clip/vit-large-p14_headless.py
Converted From:
Code: https://github.com/mlfoundations/open_clip
Weights: https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt