Commit Graph

283 Commits (a4e8f78c5eba7500ba36f70c805ce76de5b4b0a9)

Author SHA1 Message Date
Glenn Jocher 50a9828679
DDP `torch.jit.trace()` `--sync-bn` fix (#4615)
* Remove assert

* debug0

* trace=not opt.sync

* sync to sync_bn fix

* Cleanup
2021-08-30 18:35:07 +02:00
Glenn Jocher de44376d1b
Create `Annotator()` class (#4591)
* Add Annotator() class

* Download Arial

* 2x for loop

* Cleanup

* tuple 2 list

* max_size=1920

* bold logging results to

* tolist()

* im = annotator.im

* PIL save in detect.py

* Smart asarray in detect.py

* revert to cv2.imwrite

* Cleanup

* Return result asarray

* Add `Profile()` profiler

* CamelCase Timeout

* Resize after mosaic

* pillow>=8.0.0

* daemon imwrite

* Add cv2 support

* Remove plot_wh_methods and plot_one_box

* pil=False for hubconf.py annotations

* im.shape bug fix

* colorstr common.py

* join daemons

* Update t.daemon

* Removed daemon saving
2021-08-29 16:46:13 +02:00
Glenn Jocher d7aa3f153d
Remove `image_weights` DDP code (#4579)
* Initial commit

* Update
2021-08-28 19:17:21 +02:00
Glenn Jocher 93cc015748
Add EarlyStopping feature (#4576)
* Add EarlyStopping feature

* Add comment

* Cleanup

* Cleanup2

* debug

* debug2

* debug3

* debug3

* debug4

* debug5

* debug6

* debug7

* debug8

* debug9

* debug10

* debug11

* debug12

* Cleanup

* Add TODO for known DDP issue
2021-08-28 19:03:52 +02:00
Glenn Jocher 19d03a955c
Remove DDP process group timeout (#4422) 2021-08-15 18:32:41 +02:00
Glenn Jocher 24bea5e4b7
Standardize headers and docstrings (#4417)
* Implement new headers

* Reformat 1

* Reformat 2

* Reformat 3 - math

* Reformat 4 - yaml
2021-08-14 21:17:51 +02:00
Glenn Jocher ce7deec440
`int(mlc)` (#4385) 2021-08-11 17:32:13 +02:00
Glenn Jocher 86c7150cfd
Update newline (#4308) 2021-08-04 17:41:38 +02:00
Glenn Jocher e78aeac973
Evolve in CSV format (#4307)
* Update evolution to CSV format

* Update

* Update

* Update

* Update

* Update

* reset args

* reset args

* reset args

* plot_results() fix

* Cleanup

* Cleanup2
2021-08-04 17:13:38 +02:00
junji hashimoto 2d99063201
Feature `python train.py --cache disk` (#4049)
* Add cache-on-disk and cache-directory to cache images on disk

* Fix load_image with cache_on_disk

* Add no_cache flag for load_image

* Revert the parts('logging' and a new line) that do not need to be modified

* Add the assertion for shapes of cached images

* Add a suffix string for cached images

* Fix boundary-error of letterbox for load_mosaic

* Add prefix as cache-key of cache-on-disk

* Update cache-function on disk

* Add psutil in requirements.txt

* Update train.py

* Cleanup1

* Cleanup2

* Skip existing npy

* Include re-space

* Export return character fix

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2021-08-02 18:47:24 +02:00
Kalen Michael b74929c910
Add `train.py` and `val.py` callbacks (#4220)
* added callbacks

* Update callbacks.py

* Update train.py

* Update val.py

* Fix CamlCase add staticmethod

* Refactor logger into callbacks

* Cleanup

* New callback on_val_image_end()

* Add curves and results images to TensorBoard

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2021-08-01 00:18:07 +02:00
IneovaAI bceb57b910
Add `python train.py --freeze N` argument (#4238)
* Add freeze as an argument

I train on different platforms and sometimes I want to freeze some layers. I have to go into the code and change it and also keep track of how many layers I froze on what platform. Please add the number of layers to freeze as an argument in future versions thanks.

* Update train.py

* Update train.py

* Cleanup

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2021-07-30 17:39:48 +02:00
Glenn Jocher 8d3c3ef45c
Fix weight decay comment (#4228) 2021-07-30 01:35:39 +02:00
Glenn Jocher c2c958c350
Explicit `requirements.txt` location (#4225) 2021-07-29 17:29:39 +02:00
Glenn Jocher b60b62e874
PyCharm reformat (#4209)
* PyCharm reformat

* YAML reformat

* Markdown reformat
2021-07-28 23:35:14 +02:00
Ayush Chaurasia e88e8f7a98
W&B: Restructure code to support the new dataset_check() feature (#4197)
* Improve docstrings and run names

* default wandb login prompt with timeout

* return key

* Update api_key check logic

* Properly support zipped dataset feature

* update docstring

* Revert tuorial change

* extend changes to log_dataset

* add run name

* bug fix

* bug fix

* Update comment

* fix import check

* remove unused import

* Hardcore .yaml file extension

* reduce code

* Reformat using pycharm

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2021-07-28 17:40:08 +02:00
Glenn Jocher 5d66e48723
Train from `--data path/to/dataset.zip` feature (#4185)
* Train from `--data path/to/dataset.zip` feature

* Update dataset_stats()

* cleanup

* cleanup2
2021-07-28 02:04:10 +02:00
Glenn Jocher 0ad6301c96
Update script headers (#4163)
* Update download script headers

* cleanup

* bug fix attempt

* bug fix attempt2

* bug fix attempt3

* cleanup
2021-07-26 15:23:33 +02:00
Glenn Jocher 96e36a7c91
New CSV Logger (#4148)
* New CSV Logger

* cleanup

* move batch plots into Logger

* rename comment

* Remove total loss from progress bar

* mloss :-1 bug fix

* Update plot_results()

* Update plot_results()

* plot_results bug fix
2021-07-25 19:06:37 +02:00
Glenn Jocher efe60b5681
Refactor train.py and val.py `loggers` (#4137)
* Update loggers

* Config

* Update val.py

* cleanup

* fix1

* fix2

* fix3 and reformat

* format sweep.py

* Logger() class

* cleanup

* cleanup2

* wandb package import fix

* wandb package import fix2

* txt fix

* fix4

* fix5

* fix6

* drop wandb into utils/loggers

* fix 7

* rename loggers/wandb_logging to loggers/wandb

* Update message

* Update message

* Update message

* cleanup

* Fix x axis bug

* fix rank 0 issue

* cleanup
2021-07-25 01:18:39 +02:00
Glenn Jocher 63dd65e7ed
Update train.py (#4136)
* Refactor train.py

* Update imports

* Update imports

* Update optimizer

* cleanup
2021-07-24 16:11:39 +02:00
Glenn Jocher 2c073cd207
Add train.py ``--img-size` floor (#4099) 2021-07-21 16:50:47 +02:00
Glenn Jocher f7d8562060
`val.py` refactor (#4053)
* val.py refactor

* cleanup

* cleanup

* cleanup

* cleanup

* save after eval

* opt.imgsz bug fix

* wandb refactor

* dataloader to train_loader

* capitalize global variables

* runs/hub/exp to runs/detect/exp

* refactor wandb logging

* Refactor wandb operations (#4061)

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
2021-07-19 10:43:01 +02:00
Glenn Jocher 951922c735
Add `--sync-bn` known issue (#4032)
* Add `--sync-bn` known issue

* Update train.py
2021-07-17 13:07:19 +02:00
Glenn Jocher 720aaa65c8
Rename `test.py` to `val.py` (#4000) 2021-07-14 15:43:54 +02:00
Eldar Kurtic e7888af94c
Fix inconsistent NMS IoU value for COCO (#3934)
Evaluation of 'best' and 'last' models will use the same params as the evaluation during the training phase. 
This PR fixes https://github.com/ultralytics/yolov5/issues/3907
2021-07-08 15:29:02 +02:00
Glenn Jocher 8930e22cce
Evolution commented `hyp['anchors']` fix (#3887)
Fix for `KeyError: 'anchors'` error when start hyperparameter evolution:
```bash
python train.py --evolve
```

```bash
Traceback (most recent call last):
  File "E:\yolov5\train.py", line 623, in <module>
    hyp[k] = max(hyp[k], v[1])  # lower limit
KeyError: 'anchors'
```
2021-07-05 12:48:27 +02:00
san-soucie d3e9d69850
`--evolve 300` generations CLI argument (#3863)
* evolve command accepts argument for number of generations

* evolve generations argument used in evolve for loop

* evolve argument boolean fixes

* default to 300 evolve generations

* Update train.py

Co-authored-by: John San Soucie <jsansoucie@whoi.edu>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2021-07-04 12:14:35 +02:00
Glenn Jocher c6c88dc601
Copy-Paste augmentation for YOLOv5 (#3845)
* Copy-paste augmentation initial commit

* if any segments

* Add obscuration rejection

* Add copy_paste hyperparameter

* Update comments
2021-07-01 00:35:04 +02:00
yellowdolphin 3974d725b6
Fix warmup `accumulate` (#3722)
* gradient accumulation during warmup in train.py

Context:
`accumulate` is the number of batches/gradients accumulated before calling the next optimizer.step().
During warmup, it is ramped up from 1 to the final value nbs / batch_size. 
Although I have not seen this in other libraries, I like the idea. During warmup, as grads are large, too large steps are more of on issue than gradient noise due to small steps.

The bug:
The condition to perform the opt step is wrong
> if ni % accumulate == 0:
This produces irregular step sizes if `accumulate` is not constant. It becomes relevant when batch_size is small and `accumulate` changes many times during warmup.

This demo also shows the proposed solution, to use a ">=" condition instead:
https://colab.research.google.com/drive/1MA2z2eCXYB_BC5UZqgXueqL_y1Tz_XVq?usp=sharing

Further, I propose not to restrict the number of warmup iterations to >= 1000. If the user changes hyp['warmup_epochs'], this causes unexpected behavior. Also, it makes evolution unstable if this parameter was to be optimized.

* replace last_opt_step tracking by do_step(ni)

* add docstrings

* move down nw

* Update train.py

* revert math import move

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2021-06-28 12:25:13 +02:00
Glenn Jocher 92d49fde35
Update seeds for single-GPU reproducibility (#3789)
For seed=0 on single-GPU.
2021-06-26 15:42:40 +02:00
Piotr Skalski 09246a5a33
fix/incorrect_fitness_import (#3770) 2021-06-25 16:16:18 +02:00
Glenn Jocher f2d97ebb25
Remove DDP MultiHeadAttention fix (#3768) 2021-06-25 12:52:05 +02:00
Glenn Jocher f79d7479da
Add optional dataset.yaml `path` attribute (#3753)
* Add optional dataset.yaml `path` attribute

@KalenMike

* pass locals to python scripts

* handle lists

* update coco128.yaml

* Capitalize first letter

* add test key

* finalize GlobalWheat2020.yaml

* finalize objects365.yaml

* finalize SKU-110K.yaml

* finalize SKU-110K.yaml

* finalize VisDrone.yaml

* NoneType fix

* update download comment

* voc to VOC

* update

* update VOC.yaml

* update VOC.yaml

* remove dashes

* delete get_voc.sh

* force coco and coco128 to ../datasets

* Capitalize Argoverse_HD.yaml

* Capitalize Objects365.yaml

* update Argoverse_HD.yaml

* coco segments fix

* VOC single-thread

* update Argoverse_HD.yaml

* update data_dict in test handling

* create root
2021-06-25 01:25:03 +02:00
Glenn Jocher ae4261c774
Force non-zero hyp evolution weights `w` (#3748)
Fix for https://github.com/ultralytics/yolov5/issues/3741
2021-06-23 12:56:22 +02:00
Glenn Jocher fdc22398fa
Create `data/hyps` directory (#3747) 2021-06-23 12:49:38 +02:00
Glenn Jocher 1f69d12591
Update 4 main ops for paths and .run() (#3715)
* Add yolov5/ to path

* rename functions to run()

* cleanup

* rename fix

* CI fix

* cleanup find models/export.py
2021-06-21 17:25:04 +02:00
Ayush Chaurasia 75c0ff43af
[x]W&B: Don't resume transfer learning runs (#3604)
* Allow config cahnge

* Allow val change in wandb config

* Don't resume transfer learning runs

* Add entity in log dataset
2021-06-21 14:00:25 +02:00
Glenn Jocher e8810a53e8
Update DDP backend `if dist.is_nccl_available()` (#3705) 2021-06-20 17:15:42 +02:00
Glenn Jocher fbf41e0913
Add `train.run()` method (#3700)
* Update train.py explicit arguments

* Update train.py

* Add run method
2021-06-20 15:06:58 +02:00
Glenn Jocher c1af67dcd4
Add torch DP warning (#3698) 2021-06-19 19:50:46 +02:00
Glenn Jocher b3e2f4e08d
Eliminate `total_batch_size` variable (#3697)
* Eliminate `total_batch_size` variable

* cleanup

* Update train.py
2021-06-19 19:14:59 +02:00
Glenn Jocher fad27c0046
Update DDP for `torch.distributed.run` with `gloo` backend (#3680)
* Update DDP for `torch.distributed.run`

* Add LOCAL_RANK

* remove opt.local_rank

* backend="gloo|nccl"

* print

* print

* debug

* debug

* os.getenv

* gloo

* gloo

* gloo

* cleanup

* fix getenv

* cleanup

* cleanup destroy

* try nccl

* return opt

* add --local_rank

* add timeout

* add init_method

* gloo

* move destroy

* move destroy

* move print(opt) under if RANK

* destroy only RANK 0

* move destroy inside train()

* restore destroy outside train()

* update print(opt)

* cleanup

* nccl

* gloo with 60 second timeout

* update namespace printing
2021-06-19 16:30:25 +02:00
lb-desupervised bfb2276b1d
Slightly modify CLI execution (#3687)
* Slightly modify CLI execution

This simple change makes it easier to run the primary functions of this
repo (train/detect/test) from within Python. An object which represents
`opt` can be constructed and fed to the `main` function of each of these
modules, rather than having to call the lower level functions directly,
or run the module as a script.

* Update export.py

Add CLI parsing update for more convenient module usage within Python.

Co-authored-by: Lewis Belcher <lb@desupervised.io>
2021-06-19 12:06:59 +02:00
Glenn Jocher 2296f1546f
Update `WORLD_SIZE` and `RANK` retrieval (#3670) 2021-06-17 23:24:30 +02:00
Glenn Jocher 045d5d8629
Update TensorBoard (#3669) 2021-06-17 22:12:42 +02:00
Glenn Jocher fa201f968e
Update `train(hyp, *args)` to accept `hyp` file or dict (#3668) 2021-06-17 22:03:25 +02:00
Glenn Jocher 6d6e2ca65f
Update train.py (#3667) 2021-06-17 21:32:39 +02:00
Wei Quan 4c5d9bff80
Fix incorrect end epoch comment (#3612) 2021-06-15 11:24:56 +02:00
Glenn Jocher 4984cf54be
train.py GPU memory fix (#3590)
* train.py GPU memory fix

* ema

* cuda

* cuda

* zeros input

* to device

* batch index 0
2021-06-11 20:24:03 +02:00