* Do mixup in custom collate fn if prefetcher enabled, reduces performance impact
* Move mixup code to own file
* Add arg to disable prefetcher
* Fix no cuda transfer when prefetcher off
* Random erasing when prefetcher off wasn't changed to match new args, fixed
* Default random erasing to off (prob = 0.) for train