mirror of https://github.com/open-mmlab/mmclassification.git synced 2025-06-03 21:53:55 +08:00

[Feature] Support multiple multi-modal algorithms and inferencers. (#1561 )

* [Feat] Migrate blip caption to mmpretrain. (#50)

* Migrate blip caption to mmpretrain

* minor fix

* support train

* [Feature] Support OFA caption task. (#51)

* [Feature] Support OFA caption task.

* Remove duplicated files.

* [Feature] Support OFA vqa task. (#58)

* [Feature] Support OFA vqa task.

* Fix lint.

* [Feat] Add BLIP retrieval to mmpretrain. (#55)

* init

* minor fix for train

* fix according to comments

* refactor

* Update Blip retrieval. (#62)

* [Feature] Support OFA visual grounding task. (#59)

* [Feature] Support OFA visual grounding task.

* minor add TODO

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>

* [Feat] Add flamingos coco caption and vqa. (#60)

* first init

* init flamingo coco

* add vqa

* minor fix

* remove unnecessary modules

* Update config

* Use `ApplyToList`.

---------

Co-authored-by: mzr1996 <mzr1996@163.com>

* [Feature]: BLIP2 coco retrieval  (#53)

* [Feature]: Add blip2 retriever

* [Feature]: Add blip2 all modules

* [Feature]: Refine model

* [Feature]: x1

* [Feature]: Runnable coco ret

* [Feature]: Runnable version

* [Feature]: Fix lint

* [Fix]: Fix lint

* [Feature]: Use 364 img size

* [Feature]: Refactor blip2

* [Fix]: Fix lint

* refactor files

* minor fix

* minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>

* Remove

* fix blip caption inputs (#68)

* [Feat] Add BLIP NLVR support. (#67)

* first init

* init flamingo coco

* add vqa

* add nlvr

* refactor nlvr

* minor fix

* minor fix

* Update dataset

---------

Co-authored-by: mzr1996 <mzr1996@163.com>

* [Feature]: BLIP2 Caption (#70)

* [Feature]: Add language model

* [Feature]: blip2 caption forward

* [Feature]: Reproduce the results

* [Feature]: Refactor caption

* refine config

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>

* [Feat] Migrate BLIP VQA to mmpretrain (#69)

* reformat

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* change

* refactor code

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>

* Update RefCOCO dataset

* [Fix] fix lint

* [Feature] Implement inference APIs for multi-modal tasks. (#65)

* [Feature] Implement inference APIs for multi-modal tasks.

* [Project] Add gradio demo.

* [Improve] Update requirements

* Update flamingo

* Update blip

* Add NLVR inferencer

* Update flamingo

* Update hugging face model register

* Update ofa vqa

* Update BLIP-vqa (#71)

* Update blip-vqa docstring (#72)

* Refine flamingo docstring (#73)

* [Feature]: BLIP2 VQA (#61)

* [Feature]: VQA forward

* [Feature]: Reproduce accuracy

* [Fix]: Fix lint

* [Fix]: Add blank line

* minor fix

---------

Co-authored-by: yingfhu <yingfhu@gmail.com>

* [Feature]: BLIP2 docstring (#74)

* [Feature]: Add caption docstring

* [Feature]: Add docstring to blip2 vqa

* [Feature]: Add docstring to retrieval

* Update BLIP-2 metafile and README (#75)

* [Feature]: Add readme and docstring

* Update blip2 results

---------

Co-authored-by: mzr1996 <mzr1996@163.com>

* [Feature] BLIP Visual Grounding on MMPretrain Branch (#66)

* blip grounding merge with mmpretrain

* remove commit

* blip grounding test and inference api

* refcoco dataset

* refcoco dataset refine config

* rebasing

* gitignore

* rebasing

* minor edit

* minor edit

* Update blip-vqa docstring (#72)

* rebasing

* Revert "minor edit"

This reverts commit 639cec757c215e654625ed0979319e60f0be9044.

* blip grounding final

* precommit

* refine config

* refine config

* Update blip visual grounding

---------

Co-authored-by: Yiqin Wang 王逸钦 <wyq1217@outlook.com>
Co-authored-by: mzr1996 <mzr1996@163.com>

* Update visual grounding metric

* Update OFA docstring, README and metafiles. (#76)

* [Docs] Update installation docs and gradio demo docs. (#77)

* Update OFA name

* Update Visual Grounding Visualizer

* Integrate accelerate support

* Fix imports.

* Fix timm backbone

* Update imports

* Update README

* Update circle ci

* Update flamingo config

* Add gradio demo README

* [Feature]: Add scienceqa (#1571)

* [Feature]: Add scienceqa

* [Feature]: Change param name

* Update docs

* Update video

---------

Co-authored-by: Hubert <42952108+yingfhu@users.noreply.github.com>
Co-authored-by: yingfhu <yingfhu@gmail.com>
Co-authored-by: Yuan Liu <30762564+YuanLiuuuuuu@users.noreply.github.com>
Co-authored-by: Yiqin Wang 王逸钦 <wyq1217@outlook.com>
Co-authored-by: Rongjie Li <limo97@163.com>

2023-05-19 16:50:04 +08:00

6.0 KiB

Raw Blame History

依赖环境

在本节中，我们将演示如何准备 PyTorch 相关的依赖环境。

MMPretrain 适用于 Linux、Windows 和 macOS。它需要 Python 3.7+、CUDA 10.2+ 和 PyTorch 1.8+。

如果你对配置 PyTorch 环境已经很熟悉，并且已经完成了配置，可以直接进入[下一节](#安装)。
否则的话，请依照以下步骤完成配置。

第 1 步 从官网下载并安装 Miniconda。

第 2 步 创建一个 conda 虚拟环境并激活它。

conda create --name openmmlab python=3.8 -y
conda activate openmmlab

第 3 步 按照官方指南安装 PyTorch。例如：

在 GPU 平台：

conda install pytorch torchvision -c pytorch

以上命令会自动安装最新版的 PyTorch 与对应的 cudatoolkit，请检查它们是否与你的环境匹配。

在 CPU 平台：

conda install pytorch torchvision cpuonly -c pytorch

安装

我们推荐用户按照我们的最佳实践来安装 MMPretrain。但除此之外，如果你想根据你的习惯完成安装流程，也可以参见自定义安装一节来获取更多信息。

最佳实践

根据具体需求，我们支持两种安装模式：

从源码安装（推荐）：希望基于 MMPretrain 框架开发自己的预训练任务，需要添加新的功能，比如新的模型或是数据集，或者使用我们提供的各种工具。
作为 Python 包安装：只是希望调用 MMPretrain 的 API 接口，或者在自己的项目中导入 MMPretrain 中的模块。

从源码安装

这种情况下，从源码按如下方式安装 mmpretrain：

git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
pip install -U openmim && mim install -e .

`"-e"` 表示以可编辑形式安装，这样可以在不重新安装的情况下，让本地修改直接生效

作为 Python 包安装

直接使用 mim 安装即可。

pip install -U openmim && mim install "mmpretrain>=1.0.0rc7"

`mim` 是一个轻量级的命令行工具，可以根据 PyTorch 和 CUDA 版本为 OpenMMLab 算法库配置合适的环境。同时它也提供了一些对于深度学习实验很有帮助的功能。

安装多模态支持 (可选)

MMPretrain 中的多模态模型需要额外的依赖项，要安装这些依赖项，请在安装过程中添加 [multimodal] 参数，如下所示：

# 从源码安装
mim install -e ".[multimodal]"

# 作为 Python 包安装
mim install "mmpretrain[multimodal]>=1.0.0rc7"

验证安装

为了验证 MMPretrain 的安装是否正确，我们提供了一些示例代码来执行模型推理。

如果你是从源码安装的 mmpretrain，那么直接运行以下命令进行验证：

python demo/image_demo.py demo/demo.JPEG resnet18_8xb32_in1k --device cpu

你可以看到命令行中输出了结果字典，包括 pred_label，pred_score 和 pred_class 三个字段。

如果你是作为 Python 包安装，那么可以打开你的 Python 解释器，并粘贴如下代码：

from mmpretrain import get_model, inference_model

model = get_model('resnet18_8xb32_in1k', device='cpu')  # 或者 device='cuda:0'
inference_model(model, 'demo/demo.JPEG')

你会看到输出一个字典，包含预测的标签、得分及类别名。

以上示例中，`resnet18_8xb32_in1k` 是模型名称。你可以使用 [`mmpretrain.list_models`](mmpretrain.apis.list_models) 接口来
浏览所有的模型，或者在[模型汇总](./modelzoo_statistics.md)页面进行查找。

自定义安装

CUDA 版本

安装 PyTorch 时，需要指定 CUDA 版本。如果您不清楚选择哪个，请遵循我们的建议：

对于 Ampere 架构的 NVIDIA GPU，例如 GeForce 30 series 以及 NVIDIA A100，CUDA 11 是必需的。
对于更早的 NVIDIA GPU，CUDA 11 是向前兼容的，但 CUDA 10.2 能够提供更好的兼容性，也更加轻量。

请确保你的 GPU 驱动版本满足最低的版本需求，参阅这张表。

如果按照我们的最佳实践进行安装，CUDA 运行时库就足够了，因为我们提供相关 CUDA 代码的预编译，你不需要进行本地编译。
但如果你希望从源码进行 MMCV 的编译，或是进行其他 CUDA 算子的开发，那么就必须安装完整的 CUDA 工具链，参见
[NVIDIA 官网](https://developer.nvidia.com/cuda-downloads)，另外还需要确保该 CUDA 工具链的版本与 PyTorch 安装时
的配置相匹配（如用 `conda install` 安装 PyTorch 时指定的 cudatoolkit 版本）。

在 CPU 环境中安装

MMPretrain 可以仅在 CPU 环境中安装，在 CPU 模式下，你可以完成训练、测试和模型推理等所有操作。

在 Google Colab 中安装

参考 Colab 教程安装即可。

通过 Docker 使用 MMPretrain

MMPretrain 提供 Dockerfile 用于构建镜像。请确保你的 Docker 版本 >=19.03。

# 构建默认的 PyTorch 1.12.1，CUDA 11.3 版本镜像
# 如果你希望使用其他版本，请修改 Dockerfile
docker build -t mmpretrain docker/

用以下命令运行 Docker 镜像：

docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmpretrain/data mmpretrain

故障解决

如果你在安装过程中遇到了什么问题，请先查阅常见问题。如果没有找到解决方法，可以在 GitHub 上提出 issue。

6.0 KiB Raw Blame History Unescape Escape