* Add knowledge distillation
* Bugfix
* Bugfix
* Make names more readable and use single torch.cat call
* Remove criterion.train() in engine
The teacher should stay in eval mode
* Change default argument for teacher-model
* Return the average of classifiers during inference
* Cleanup unused code
* Add docstring for DistillationLoss
* Remove warnings from newer PyTorch
Also uses more stable variant, instead of using softmax + log, use directly log_softmax
* support parallelized evaluation
* remove shuffle arg of loader val, add sampler val in non-dist branch
* replace timm eval sampler with torch sampler
* add logger synchronizing to support parallelized evaluation
* add command line argument dist-eval and warning