* Update dataloaders.py
This is to address (and hopefully fix) this issue: Multi-GPU DDP RAM multiple-cache bug #3818 (https://github.com/ultralytics/yolov5/issues/3818). This was a very serious and "blocking" issue until I could figure out what was going on. The problem was especially bad when running Multi-GPU jobs with 8 GPUs, RAM usage was 8x higher than expected (!), causing repeated OOM failures. Hopefully this fix will help others.
DDP causes each RANK to launch it's own process (one for each GPU) with it's own trainloader, and its own RAM image cache. The DistributedSampler used by DDP (https://github.com/pytorch/pytorch/blob/master/torch/utils/data/distributed.py) will feed only a subset of images (1/WORLD_SIZE) to each available GPU on each epoch, but since the images are shuffled between epochs, each GPU process must still cache all images. So I created a subclass of DistributedSampler called SmartDistributedSampler that forces each GPU process to always sample the same subset (using modulo arithmetic with RANK and WORLD_SIZE) while still allowing random shuffling between epochs. I don't believe this disrupts the overall "randomness" of the sampling, and I haven't noticed any performance degradation.
Signed-off-by: davidsvaughn <davidsvaughn@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update dataloaders.py
move extra parameter (rank) to end so won't mess up pre-existing positional args
* Update dataloaders.py
removing extra '#'
* Update dataloaders.py
sample from DDP index array (self.idx) in mixup mosaic
* Merging self.indices and self.idx (DDP indices) into single attribute (self.indices).
Also adding SmartDistributedSampler to segmentation dataloader
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Multiply GB displayed by WORLD_SIZE
---------
Signed-off-by: davidsvaughn <davidsvaughn@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Added ClearML instance segmentation and classification support
* Cleaned up ClearML plot output
* typos
* Log results as plots instead of debug samples
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* improving evole in train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix gen_ranges value in mutation part.
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* fix invalid syntax in line 532
remove on tab from "else"
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* Update train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* fix range index
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* Update train.py
fix population size
add crossover min and max rate
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update comments
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* save population for last generation
The latest version incorporates a significant update whereby all hyper parameters are now stored in the population section of "evolve_population.yaml," located in "yolov5\data\hyps," following the transition to the new generation. This development allows for the continuation of a previously abandoned evolution process by utilizing the former population. Additionally, a new argument, "--evolve_population," has been introduced to enable the relocation of the manual "evolve_population.yaml" to any project directory to load for the aforementioned purpose. This enhancement offers greater flexibility and convenience to the users, making it easier for them to resume their evolutionary process.
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* remove try - except
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update train.py
Add resume resume_evolve arg for **resume evolve from last generation**.
Population will load from data/hyp by default and load all yaml file form them.
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* Update train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* Update train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* Update train.py
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* Update README.zh-CN.md
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
* Update train.py
update pop_size
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
---------
Signed-off-by: Shayan Mousavinia <45814390+ShAmoNiA@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* added imagenet small versions 10,100 and 1000
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix the bug that multi webcam detection failed with OpenVINO
It would failed with the following error when detect multi webcam.
"Input blob size is not equal network input size (2457600!=1228800)"
Signed-off-by: Chao Qin <chaoqin_cmkj@163.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Chao Qin <chaoqin_cmkj@163.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Update selectable device Profile
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix bug in #12457.
When run 'python.exe segment/predict.py --visualize' will throw AttributeError: 'tuple' object has no attribute 'shape'
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Add ndjson logging for training
This adds support for NDJSON (newline-delimited JSON) metrics logging,
for both console (stdout) output and a file (like the current CSV file).
NDJSON can be easily grepped from the output and/or parsed with e.g. `jq`.
The feature is enabled with the `--ndjson-console` and `--ndjson-file`
switches to `train.py`.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Update val.py
When saving predicted labels, create a folder named labels.
Signed-off-by: Ryan <35791309+Gary55555@users.noreply.github.com>
* Update val.py
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Ryan <35791309+Gary55555@users.noreply.github.com>
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add option to quantize per-tensor
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Parametrize multiple of number of channels in Conv
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix issue when exporting
Signed-off-by: Angelo Delli Santi <dellisanti.angelo@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Angelo Delli Santi <dellisanti.angelo@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Limit tensorflow version and add checks
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Moving check in export script
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
fix: requirements.txt to reduce vulnerabilities
The following vulnerabilities are fixed by pinning transitive dependencies:
- https://snyk.io/vuln/SNYK-PYTHON-PILLOW-5918878
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
* Increase NCCL timeout to 3 hours
When training on a large dataset using DDP, the scanning process will be very long, and it will raise NCCL timeout error. Change the default timeout 30min to 3 hours, same as ultralytics yolov8 (https://github.com/ultralytics/ultralytics/pull/3343)
Signed-off-by: Troy <wudashuo@vip.qq.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Troy <wudashuo@vip.qq.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Created using Colaboratory
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
fix: requirements.txt to reduce vulnerabilities
The following vulnerabilities are fixed by pinning transitive dependencies:
- https://snyk.io/vuln/SNYK-PYTHON-PILLOW-5918878
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
fix: requirements.txt to reduce vulnerabilities
The following vulnerabilities are fixed by pinning transitive dependencies:
- https://snyk.io/vuln/SNYK-PYTHON-PILLOW-5918878
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
fix: requirements.txt to reduce vulnerabilities
The following vulnerabilities are fixed by pinning transitive dependencies:
- https://snyk.io/vuln/SNYK-PYTHON-PILLOW-6043904
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
fix: requirements.txt to reduce vulnerabilities
The following vulnerabilities are fixed by pinning transitive dependencies:
- https://snyk.io/vuln/SNYK-PYTHON-PILLOW-5918878
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
fix: utils/google_app_engine/additional_requirements.txt to reduce vulnerabilities
The following vulnerabilities are fixed by pinning transitive dependencies:
- https://snyk.io/vuln/SNYK-PYTHON-WERKZEUG-6035177
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
* Update social media links
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Update export.py
Signed-off-by: Luis Filipe Araujo de Souza <58831491+Doquey@users.noreply.github.com>
* Update export.py
Signed-off-by: Luis Filipe Araujo de Souza <58831491+Doquey@users.noreply.github.com>
* Update export.py
Transformed the f variable into a string on the export onnx. This bug was making it impossible to export any models in .onnx, since it was making the typehint not accept the users input as it is specified in the functions documentation
Signed-off-by: Luis Filipe Araujo de Souza <58831491+Doquey@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Luis Filipe Araujo de Souza <58831491+Doquey@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>