Glenn Jocher fad27c0046
Update DDP for torch.distributed.run with gloo backend (#3680)
* Update DDP for `torch.distributed.run`

* Add LOCAL_RANK

* remove opt.local_rank

* backend="gloo|nccl"

* print

* print

* debug

* debug

* os.getenv

* gloo

* gloo

* gloo

* cleanup

* fix getenv

* cleanup

* cleanup destroy

* try nccl

* return opt

* add --local_rank

* add timeout

* add init_method

* gloo

* move destroy

* move destroy

* move print(opt) under if RANK

* destroy only RANK 0

* move destroy inside train()

* restore destroy outside train()

* update print(opt)

* cleanup

* nccl

* gloo with 60 second timeout

* update namespace printing
2021-06-19 16:30:25 +02:00
..
2021-01-04 19:54:09 -08:00
2021-01-04 19:54:09 -08:00
2021-01-04 19:54:09 -08:00
2021-01-04 19:54:09 -08:00