556 lines
16 KiB
Plaintext
556 lines
16 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"colab_type": "text",
|
|
"id": "view-in-github"
|
|
},
|
|
"source": [
|
|
"<a href=\"https://colab.research.google.com/github/open-mmlab/mmsegmentation/blob/main/demo/MMSegmentation_Tutorial.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "FVmnaxFJvsb8"
|
|
},
|
|
"source": [
|
|
"# MMSegmentation Tutorial\n",
|
|
"Welcome to MMSegmentation! \n",
|
|
"\n",
|
|
"In this tutorial, we demo\n",
|
|
"* How to do inference with MMSeg trained weight\n",
|
|
"* How to train on your own dataset and visualize the results. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "QS8YHrEhbpas"
|
|
},
|
|
"source": [
|
|
"## Install MMSegmentation\n",
|
|
"This step may take several minutes. \n",
|
|
"\n",
|
|
"We use PyTorch 1.12 and CUDA 11.3 for this tutorial. You may install other versions by change the version number in pip install command. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "UWyLrLYaNEaL",
|
|
"outputId": "32a47fe3-f10d-47a1-f6b9-b7c235abdab1"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Check nvcc version\n",
|
|
"!nvcc -V\n",
|
|
"# Check GCC version\n",
|
|
"!gcc --version"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "Ki3WUBjKbutg",
|
|
"outputId": "14bd14b0-4d8c-4fa9-e3f9-da35c0efc0d5"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Install PyTorch\n",
|
|
"!conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch\n",
|
|
"# Install mim\n",
|
|
"!pip install -U openmim\n",
|
|
"# Install mmengine\n",
|
|
"!mim install mmengine\n",
|
|
"# Install MMCV\n",
|
|
"!mim install 'mmcv >= 2.0.0rc1'\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "nR-hHRvbNJJZ",
|
|
"outputId": "10c3b131-d4db-458c-fc10-b94b1c6ed546"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"!rm -rf mmsegmentation\n",
|
|
"!git clone -b main https://github.com/open-mmlab/mmsegmentation.git \n",
|
|
"%cd mmsegmentation\n",
|
|
"!pip install -e ."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "mAE_h7XhPT7d",
|
|
"outputId": "83bf0f8e-fc69-40b1-f9fe-0025724a217c"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Check Pytorch installation\n",
|
|
"import torch, torchvision\n",
|
|
"print(torch.__version__, torch.cuda.is_available())\n",
|
|
"\n",
|
|
"# Check MMSegmentation installation\n",
|
|
"import mmseg\n",
|
|
"print(mmseg.__version__)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "Ta51clKX4cwM"
|
|
},
|
|
"source": [
|
|
"## Finetune a semantic segmentation model on a new dataset\n",
|
|
"\n",
|
|
"To finetune on a customized dataset, the following steps are necessary. \n",
|
|
"1. Add a new dataset class. \n",
|
|
"2. Create a config file accordingly. \n",
|
|
"3. Perform training and evaluation. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "AcZg6x_K5Zs3"
|
|
},
|
|
"source": [
|
|
"### Add a new dataset\n",
|
|
"\n",
|
|
"Datasets in MMSegmentation require image and semantic segmentation maps to be placed in folders with the same prefix. To support a new dataset, we may need to modify the original file structure. \n",
|
|
"\n",
|
|
"In this tutorial, we give an example of converting the dataset. You may refer to [docs](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/tutorials/customize_datasets.md#customize-datasets-by-reorganizing-data) for details about dataset reorganization. \n",
|
|
"\n",
|
|
"We use [Stanford Background Dataset](http://dags.stanford.edu/projects/scenedataset.html) as an example. The dataset contains 715 images chosen from existing public datasets [LabelMe](http://labelme.csail.mit.edu), [MSRC](http://research.microsoft.com/en-us/projects/objectclassrecognition), [PASCAL VOC](http://pascallin.ecs.soton.ac.uk/challenges/VOC) and [Geometric Context](http://www.cs.illinois.edu/homes/dhoiem/). Images from these datasets are mainly outdoor scenes, each containing approximately 320-by-240 pixels. \n",
|
|
"In this tutorial, we use the region annotations as labels. There are 8 classes in total, i.e. sky, tree, road, grass, water, building, mountain, and foreground object. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "TFIt7MHq5Wls",
|
|
"outputId": "74a126e4-c8a4-4d2f-a910-b58b71843a23"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# download and unzip\n",
|
|
"!wget http://dags.stanford.edu/data/iccv09Data.tar.gz -O stanford_background.tar.gz\n",
|
|
"!tar xf stanford_background.tar.gz"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/",
|
|
"height": 377
|
|
},
|
|
"id": "78LIci7F9WWI",
|
|
"outputId": "c432ddac-5a50-47b1-daac-5a26b07afea2"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Let's take a look at the dataset\n",
|
|
"import mmcv\n",
|
|
"import mmengine\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"\n",
|
|
"\n",
|
|
"img = mmcv.imread('iccv09Data/images/6000124.jpg')\n",
|
|
"plt.figure(figsize=(8, 6))\n",
|
|
"plt.imshow(mmcv.bgr2rgb(img))\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "L5mNQuc2GsVE"
|
|
},
|
|
"source": [
|
|
"We need to convert the annotation into semantic map format as an image."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "WnGZfribFHCx"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# define dataset root and directory for images and annotations\n",
|
|
"data_root = 'iccv09Data'\n",
|
|
"img_dir = 'images'\n",
|
|
"ann_dir = 'labels'\n",
|
|
"# define class and palette for better visualization\n",
|
|
"classes = ('sky', 'tree', 'road', 'grass', 'water', 'bldg', 'mntn', 'fg obj')\n",
|
|
"palette = [[128, 128, 128], [129, 127, 38], [120, 69, 125], [53, 125, 34], \n",
|
|
" [0, 11, 123], [118, 20, 12], [122, 81, 25], [241, 134, 51]]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "WnGZfribFHCx"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os.path as osp\n",
|
|
"import numpy as np\n",
|
|
"from PIL import Image\n",
|
|
"\n",
|
|
"# convert dataset annotation to semantic segmentation map\n",
|
|
"for file in mmengine.scandir(osp.join(data_root, ann_dir), suffix='.regions.txt'):\n",
|
|
" seg_map = np.loadtxt(osp.join(data_root, ann_dir, file)).astype(np.uint8)\n",
|
|
" seg_img = Image.fromarray(seg_map).convert('P')\n",
|
|
" seg_img.putpalette(np.array(palette, dtype=np.uint8))\n",
|
|
" seg_img.save(osp.join(data_root, ann_dir, file.replace('.regions.txt', \n",
|
|
" '.png')))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/",
|
|
"height": 377
|
|
},
|
|
"id": "5MCSS9ABfSks",
|
|
"outputId": "92b9bafc-589e-48fc-c9e9-476f125d6522"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Let's take a look at the segmentation map we got\n",
|
|
"import matplotlib.patches as mpatches\n",
|
|
"img = Image.open('iccv09Data/labels/6000124.png')\n",
|
|
"plt.figure(figsize=(8, 6))\n",
|
|
"im = plt.imshow(np.array(img.convert('RGB')))\n",
|
|
"\n",
|
|
"# create a patch (proxy artist) for every color \n",
|
|
"patches = [mpatches.Patch(color=np.array(palette[i])/255., \n",
|
|
" label=classes[i]) for i in range(8)]\n",
|
|
"# put those patched as legend-handles into the legend\n",
|
|
"plt.legend(handles=patches, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., \n",
|
|
" fontsize='large')\n",
|
|
"\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "WbeLYCp2k5hl"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# split train/val set randomly\n",
|
|
"split_dir = 'splits'\n",
|
|
"mmengine.mkdir_or_exist(osp.join(data_root, split_dir))\n",
|
|
"filename_list = [osp.splitext(filename)[0] for filename in mmengine.scandir(\n",
|
|
" osp.join(data_root, ann_dir), suffix='.png')]\n",
|
|
"with open(osp.join(data_root, split_dir, 'train.txt'), 'w') as f:\n",
|
|
" # select first 4/5 as train set\n",
|
|
" train_length = int(len(filename_list)*4/5)\n",
|
|
" f.writelines(line + '\\n' for line in filename_list[:train_length])\n",
|
|
"with open(osp.join(data_root, split_dir, 'val.txt'), 'w') as f:\n",
|
|
" # select last 1/5 as train set\n",
|
|
" f.writelines(line + '\\n' for line in filename_list[train_length:])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "HchvmGYB_rrO"
|
|
},
|
|
"source": [
|
|
"After downloading the data, we need to implement `load_annotations` function in the new dataset class `StanfordBackgroundDataset`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "LbsWOw62_o-X"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mmseg.registry import DATASETS\n",
|
|
"from mmseg.datasets import BaseSegDataset\n",
|
|
"\n",
|
|
"\n",
|
|
"@DATASETS.register_module()\n",
|
|
"class StanfordBackgroundDataset(BaseSegDataset):\n",
|
|
" METAINFO = dict(classes = classes, palette = palette)\n",
|
|
" def __init__(self, **kwargs):\n",
|
|
" super().__init__(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)\n",
|
|
" "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "yUVtmn3Iq3WA"
|
|
},
|
|
"source": [
|
|
"### Create a config file\n",
|
|
"In the next step, we need to modify the config for the training. To accelerate the process, we finetune the model from trained weights."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Download config and checkpoint files\n",
|
|
"!mim download mmsegmentation --config pspnet_r50-d8_4xb2-40k_cityscapes-512x1024 --dest ."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "Wwnj9tRzqX_A"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mmengine import Config\n",
|
|
"cfg = Config.fromfile('configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py')\n",
|
|
"print(f'Config:\\n{cfg.pretty_text}')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "1y2oV5w97jQo"
|
|
},
|
|
"source": [
|
|
"Since the given config is used to train PSPNet on the cityscapes dataset, we need to modify it accordingly for our new dataset. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "eyKnYC1Z7iCV",
|
|
"outputId": "6195217b-187f-4675-994b-ba90d8bb3078"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Since we use only one GPU, BN is used instead of SyncBN\n",
|
|
"cfg.norm_cfg = dict(type='BN', requires_grad=True)\n",
|
|
"cfg.crop_size = (256, 256)\n",
|
|
"cfg.model.data_preprocessor.size = cfg.crop_size\n",
|
|
"cfg.model.backbone.norm_cfg = cfg.norm_cfg\n",
|
|
"cfg.model.decode_head.norm_cfg = cfg.norm_cfg\n",
|
|
"cfg.model.auxiliary_head.norm_cfg = cfg.norm_cfg\n",
|
|
"# modify num classes of the model in decode/auxiliary head\n",
|
|
"cfg.model.decode_head.num_classes = 8\n",
|
|
"cfg.model.auxiliary_head.num_classes = 8\n",
|
|
"\n",
|
|
"# Modify dataset type and path\n",
|
|
"cfg.dataset_type = 'StanfordBackgroundDataset'\n",
|
|
"cfg.data_root = data_root\n",
|
|
"\n",
|
|
"cfg.train_dataloader.batch_size = 8\n",
|
|
"\n",
|
|
"cfg.train_pipeline = [\n",
|
|
" dict(type='LoadImageFromFile'),\n",
|
|
" dict(type='LoadAnnotations'),\n",
|
|
" dict(type='RandomResize', scale=(320, 240), ratio_range=(0.5, 2.0), keep_ratio=True),\n",
|
|
" dict(type='RandomCrop', crop_size=cfg.crop_size, cat_max_ratio=0.75),\n",
|
|
" dict(type='RandomFlip', prob=0.5),\n",
|
|
" dict(type='PackSegInputs')\n",
|
|
"]\n",
|
|
"\n",
|
|
"cfg.test_pipeline = [\n",
|
|
" dict(type='LoadImageFromFile'),\n",
|
|
" dict(type='Resize', scale=(320, 240), keep_ratio=True),\n",
|
|
" # add loading annotation after ``Resize`` because ground truth\n",
|
|
" # does not need to do resize data transform\n",
|
|
" dict(type='LoadAnnotations'),\n",
|
|
" dict(type='PackSegInputs')\n",
|
|
"]\n",
|
|
"\n",
|
|
"\n",
|
|
"cfg.train_dataloader.dataset.type = cfg.dataset_type\n",
|
|
"cfg.train_dataloader.dataset.data_root = cfg.data_root\n",
|
|
"cfg.train_dataloader.dataset.data_prefix = dict(img_path=img_dir, seg_map_path=ann_dir)\n",
|
|
"cfg.train_dataloader.dataset.pipeline = cfg.train_pipeline\n",
|
|
"cfg.train_dataloader.dataset.ann_file = 'splits/train.txt'\n",
|
|
"\n",
|
|
"cfg.val_dataloader.dataset.type = cfg.dataset_type\n",
|
|
"cfg.val_dataloader.dataset.data_root = cfg.data_root\n",
|
|
"cfg.val_dataloader.dataset.data_prefix = dict(img_path=img_dir, seg_map_path=ann_dir)\n",
|
|
"cfg.val_dataloader.dataset.pipeline = cfg.test_pipeline\n",
|
|
"cfg.val_dataloader.dataset.ann_file = 'splits/val.txt'\n",
|
|
"\n",
|
|
"cfg.test_dataloader = cfg.val_dataloader\n",
|
|
"\n",
|
|
"\n",
|
|
"# Load the pretrained weights\n",
|
|
"cfg.load_from = 'pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'\n",
|
|
"\n",
|
|
"# Set up working dir to save files and logs.\n",
|
|
"cfg.work_dir = './work_dirs/tutorial'\n",
|
|
"\n",
|
|
"cfg.train_cfg.max_iters = 200\n",
|
|
"cfg.train_cfg.val_interval = 200\n",
|
|
"cfg.default_hooks.logger.interval = 10\n",
|
|
"cfg.default_hooks.checkpoint.interval = 200\n",
|
|
"\n",
|
|
"# Set seed to facilitate reproducing the result\n",
|
|
"cfg['randomness'] = dict(seed=0)\n",
|
|
"\n",
|
|
"# Let's have a look at the final config used for training\n",
|
|
"print(f'Config:\\n{cfg.pretty_text}')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "QWuH14LYF2gQ"
|
|
},
|
|
"source": [
|
|
"### Train and Evaluation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "jYKoSfdMF12B",
|
|
"outputId": "422219ca-d7a5-4890-f09f-88c959942e64"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mmengine.runner import Runner\n",
|
|
"\n",
|
|
"runner = Runner.from_cfg(cfg)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# start training\n",
|
|
"runner.train()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "DEkWOP-NMbc_"
|
|
},
|
|
"source": [
|
|
"Inference with trained model"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/",
|
|
"height": 645
|
|
},
|
|
"id": "ekG__UfaH_OU",
|
|
"outputId": "1437419c-869a-4902-df86-d4f6f8b2597a"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from mmseg.apis import init_model, inference_model, show_result_pyplot\n",
|
|
"\n",
|
|
"# Init the model from the config and the checkpoint\n",
|
|
"checkpoint_path = './work_dirs/tutorial/iter_200.pth'\n",
|
|
"model = init_model(cfg, checkpoint_path, 'cuda:0')\n",
|
|
"\n",
|
|
"img = mmcv.imread('iccv09Data/images/6000124.jpg')\n",
|
|
"result = inference_model(model, img)\n",
|
|
"plt.figure(figsize=(8, 6))\n",
|
|
"vis_result = show_result_pyplot(model, img, result)\n",
|
|
"plt.imshow(mmcv.bgr2rgb(vis_result))\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"accelerator": "GPU",
|
|
"colab": {
|
|
"collapsed_sections": [],
|
|
"include_colab_link": true,
|
|
"name": "MMSegmentation Tutorial.ipynb",
|
|
"provenance": []
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3.10.6 ('pt1.12')",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.6"
|
|
},
|
|
"pycharm": {
|
|
"stem_cell": {
|
|
"cell_type": "raw",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": []
|
|
}
|
|
},
|
|
"vscode": {
|
|
"interpreter": {
|
|
"hash": "0442e67aee3d9cbb788fa6e86d60c4ffa94ad7f1943c65abfecb99a6f4696c58"
|
|
}
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|