yolov5/utils/loggers/wandb
Victor Sonck 378bde4bba
ClearML experiment tracking integration (#8620)
* Add titles to matplotlib plots

* Add ClearML Experiment Tracking integration.

* Add ClearML Data Version Management automatic download when requested

* Add ClearML Hyperparameter Optimization

* ClearML save period integration

* Fix wandb breaking when used with ClearML dataset

* Fix wandb breaking when used with ClearML resume and dataset

* Add ClearML documentation

* fixed small bug in clearml integration that misreports epoch number

* Final ClearMl additions before refactor

* Add correct epoch reporting

* Add remote execution and autoscaling docs for ClearML integration

* Added images to clearml integration docs

* fixed logo alignment bug and added hpo screenshot clearml

* Fixed small epoch number bug in clearml integration

* Remove saved model flush clearml

* Cleanup clearml readme section

* Cleaned up clearml logger docstring

* Remove resume readme section clearml

* Clearml integration cleanup

* Updated ClearML documentation

* Added dark vs light icons ClearML Readme

* Clearml Readme styling

* Add better gifs

* Fixed gif file size

* Add better images in tutorial notebook

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Addressed comments in PR #8620

* Fixed circular import

* Fixed circular import

* Update tutorial.ipynb

* Update tutorial.ipynb

* Inline comment

* Restructured tutorial notebook

* Add correct ClearML link to README

* Update tutorial.ipynb

* Update general.py

* Update __init__.py

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* Update README.md

* Update __init__.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* spelling

* Update tutorial.ipynb

* notebook cutt.ly links

* Update README.md

* Update README.md

* cutt.ly links in tutorial

* Removed labels as they show up on last subplot only

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2022-08-05 20:50:49 +02:00
..
README.md Add mdformat to precommit checks and update other version (#7529) 2022-04-22 13:36:27 -07:00
__init__.py Refactor train.py and val.py `loggers` (#4137) 2021-07-25 01:18:39 +02:00
log_dataset.py W&B refactor, handle exceptions, CI example (#5618) 2021-11-14 13:26:53 +01:00
sweep.py Copy wandb param dict before training to avoid overwrites (#7317) 2022-04-06 18:35:33 +02:00
sweep.yaml Update sweep.yaml (#6825) 2022-03-04 09:39:23 +01:00
wandb_utils.py ClearML experiment tracking integration (#8620) 2022-08-05 20:50:49 +02:00

README.md

📚 This guide explains how to use Weights & Biases (W&B) with YOLOv5 🚀. UPDATED 29 September 2021.

About Weights & Biases

Think of W&B like GitHub for machine learning models. With a few lines of code, save everything you need to debug, compare and reproduce your models — architecture, hyperparameters, git commits, model weights, GPU usage, and even datasets and predictions.

Used by top researchers including teams at OpenAI, Lyft, Github, and MILA, W&B is part of the new standard of best practices for machine learning. How W&B can help you optimize your machine learning workflows:

First-Time Setup

Toggle Details When you first train, W&B will prompt you to create a new account and will generate an **API key** for you. If you are an existing user you can retrieve your key from https://wandb.ai/authorize. This key is used to tell W&B where to log your data. You only need to supply your key once, and then it is remembered on the same device.

W&B will create a cloud project (default is 'YOLOv5') for your training runs, and each new training run will be provided a unique run name within that project as project/name. You can also manually set your project and run name as:

$ python train.py --project ... --name ...

YOLOv5 notebook example: Open In Colab Open In Kaggle Screen Shot 2021-09-29 at 10 23 13 PM

Viewing Runs

Toggle Details Run information streams from your environment to the W&B cloud console as you train. This allows you to monitor and even cancel runs in realtime . All important information is logged:
  • Training & Validation losses
  • Metrics: Precision, Recall, mAP@0.5, mAP@0.5:0.95
  • Learning Rate over time
  • A bounding box debugging panel, showing the training progress over time
  • GPU: Type, GPU Utilization, power, temperature, CUDA memory usage
  • System: Disk I/0, CPU utilization, RAM memory usage
  • Your trained model as W&B Artifact
  • Environment: OS and Python types, Git repository and state, training command

Weights & Biases dashboard

Disabling wandb

  • training after running wandb disabled inside that directory creates no wandb run Screenshot (84)

  • To enable wandb again, run wandb online Screenshot (85)

Advanced Usage

You can leverage W&B artifacts and Tables integration to easily visualize and manage your datasets, models and training evaluations. Here are some quick examples to get you started.

1: Train and Log Evaluation simultaneousy

This is an extension of the previous section, but it'll also training after uploading the dataset. This also evaluation Table Evaluation table compares your predictions and ground truths across the validation set for each epoch. It uses the references to the already uploaded datasets, so no images will be uploaded from your system more than once.
Usage Code $ python train.py --upload_data val

Screenshot from 2021-11-21 17-40-06

2. Visualize and Version Datasets

Log, visualize, dynamically query, and understand your data with W&B Tables. You can use the following command to log your dataset as a W&B Table. This will generate a {dataset}_wandb.yaml file which can be used to train from dataset artifact.
Usage Code $ python utils/logger/wandb/log_dataset.py --project ... --name ... --data ..

Screenshot (64)

3: Train using dataset artifact

When you upload a dataset as described in the first section, you get a new config file with an added `_wandb` to its name. This file contains the information that can be used to train a model directly from the dataset artifact. This also logs evaluation
Usage Code $ python train.py --data {data}_wandb.yaml

Screenshot (72)

4: Save model checkpoints as artifacts

To enable saving and versioning checkpoints of your experiment, pass `--save_period n` with the base cammand, where `n` represents checkpoint interval. You can also log both the dataset and model checkpoints simultaneously. If not passed, only the final model will be logged
Usage Code $ python train.py --save_period 1

Screenshot (68)

5: Resume runs from checkpoint artifacts.

Any run can be resumed using artifacts if the --resume argument starts with wandb-artifact:// prefix followed by the run path, i.e, wandb-artifact://username/project/runid . This doesn't require the model checkpoint to be present on the local system.
Usage Code $ python train.py --resume wandb-artifact://{run_path}

Screenshot (70)

6: Resume runs from dataset artifact & checkpoint artifacts.

Local dataset or model checkpoints are not required. This can be used to resume runs directly on a different device The syntax is same as the previous section, but you'll need to lof both the dataset and model checkpoints as artifacts, i.e, set bot --upload_dataset or train from _wandb.yaml file and set --save_period
Usage Code $ python train.py --resume wandb-artifact://{run_path}

Screenshot (70)

Reports

W&B Reports can be created from your saved runs for sharing online. Once a report is created you will receive a link you can use to publically share your results. Here is an example report created from the COCO128 tutorial trainings of all four YOLOv5 models ([link](https://wandb.ai/glenn-jocher/yolov5_tutorial/reports/YOLOv5-COCO128-Tutorial-Results--VmlldzozMDI5OTY)). Weights & Biases Reports

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.