* Add option for finetuning a model
* Fixes
* Keep model in eval mode during finetuning
* Only skip head weights if size mismatch
* Remove finetune-epochs
Might not be needed
* Raise error if distillation + finetune are enabled
* Add knowledge distillation
* Bugfix
* Bugfix
* Make names more readable and use single torch.cat call
* Remove criterion.train() in engine
The teacher should stay in eval mode
* Change default argument for teacher-model
* Return the average of classifiers during inference
* Cleanup unused code
* Add docstring for DistillationLoss
* Remove warnings from newer PyTorch
Also uses more stable variant, instead of using softmax + log, use directly log_softmax
* support parallelized evaluation
* remove shuffle arg of loader val, add sampler val in non-dist branch
* replace timm eval sampler with torch sampler
* add logger synchronizing to support parallelized evaluation
* add command line argument dist-eval and warning