* support distillation early stop
* add norm arg to ChannelWiseDivergence
* add norm connector and support a list of connectors
* delete useless codes in cwd because of the usage of the norm connector
* fix fpn distill
* fix pytest
* rename stop distillation hook
* rename stop_epoch and add doc
* rename
* replace = with >=
* set _is_init private attribute of the teacher model to True after loading checkpoint