mirror of
https://github.com/open-mmlab/mmpretrain.git
synced 2025-06-03 14:59:18 +08:00
* update dev_scripts * update metafile * update multimodal floating range * fix lint * update * update * fix lint * Update metric map --------- Co-authored-by: mzr1996 <mzr1996@163.com>
100 lines
3.4 KiB
YAML
100 lines
3.4 KiB
YAML
Collections:
|
|
- Name: BLIP
|
|
Metadata:
|
|
Training Data:
|
|
- COCO
|
|
- VG
|
|
- Conceptual Captions
|
|
- Conceptual 12M
|
|
- SBU captions
|
|
Architecture:
|
|
- Transformer
|
|
Training Resources: 8x A100 GPUs
|
|
Paper:
|
|
Title: 'BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language
|
|
Understanding and Generation'
|
|
URL: https://arxiv.org/abs/2201.12086
|
|
README: configs/blip/README.md
|
|
|
|
Models:
|
|
- Name: blip-base_8xb16_refcoco
|
|
Metadata:
|
|
FLOPs: null
|
|
Parameters: 498488636
|
|
In Collection: BLIP
|
|
Results:
|
|
- Task: Visual Grounding
|
|
Dataset: RefCOCO
|
|
Metrics:
|
|
Accuracy (testA): 86.14
|
|
Accuracy (testB): 77.33
|
|
Weights: https://download.openmmlab.com/mmclassification/v1/blip/blip-base_8xb16_refcoco_20230508-d2d10f4c.pth
|
|
Config: configs/blip/blip-base_8xb16_refcoco.py
|
|
- Name: blip-base_3rdparty_caption
|
|
Metadata:
|
|
FLOPs: null
|
|
Parameters: 223971644
|
|
In Collection: BLIP
|
|
Results:
|
|
- Dataset: COCO
|
|
Task: Image Caption
|
|
Metrics:
|
|
BLEU-4: 40.12
|
|
CIDER: 132.82
|
|
Weights: https://download.openmmlab.com/mmclassification/v1/blip/blip-base_3rdparty_coco-caption_20230419-a5b71af3.pth
|
|
Config: configs/blip/blip-base_8xb32_caption.py
|
|
Converted From:
|
|
Weights: https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP/blip_coco_caption_base.pth
|
|
Code: https://github.com/salesforce/LAVIS
|
|
- Name: blip-base_3rdparty_nlvr
|
|
Metadata:
|
|
FLOPs: null
|
|
Parameters: 259372034
|
|
In Collection: BLIP
|
|
Results:
|
|
- Task: NLVR
|
|
Dataset: NLVR2
|
|
Metrics:
|
|
Top 1 Accuracy: 82.33
|
|
Weights: https://download.openmmlab.com/mmclassification/v1/blip/blip-base_3rdparty_nlvr_20230427-3b14d33f.pth
|
|
Config: configs/blip/blip-base_8xb32_nlvr.py
|
|
Converted From:
|
|
Weights: https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_nlvr.pth
|
|
Code: https://github.com/salesforce/LAVIS
|
|
- Name: blip-base_3rdparty_vqa
|
|
Metadata:
|
|
FLOPs: null
|
|
Parameters: 361478972
|
|
In Collection: BLIP
|
|
Results:
|
|
- Task: Visual Question Answering
|
|
Dataset: VQAv2
|
|
Metrics:
|
|
Accuracy: 78.2
|
|
Weights: https://download.openmmlab.com/mmclassification/v1/blip/blip-base_3rdparty-capflit_vqa_20230505-81488941.pth
|
|
Config: configs/blip/blip-base_8xb32_vqa.py
|
|
Converted From:
|
|
Weights: https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_vqa_capfilt_large.pth
|
|
Code: https://github.com/salesforce/LAVIS
|
|
- Name: blip-base_3rdparty_retrieval
|
|
Metadata:
|
|
FLOPs: null
|
|
Parameters: 447486979
|
|
In Collection: BLIP
|
|
Results:
|
|
- Task: Image-To-Text Retrieval
|
|
Dataset: COCO
|
|
Metrics:
|
|
Recall@1: 82.52
|
|
Recall@5: 95.34
|
|
- Task: Text-To-Image Retrieval
|
|
Dataset: COCO
|
|
Metrics:
|
|
Recall@1: 64.82
|
|
Recall@5: 86.28
|
|
Weights: https://download.openmmlab.com/mmclassification/v1/blip/blip-base_3rdparty_coco-retrieval_20230419-a1804d2c.pth
|
|
Config: configs/blip/blip-base_8xb32_retrieval.py
|
|
Converted From:
|
|
Weights: https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP/blip_coco_retrieval.pth
|
|
Code: https://github.com/salesforce/LAVIS
|