* resolve conflicts
add heads and config for multilabel tasks
* minor change
* remove evaluating mAP in head
* add baseline config
* add configs
* reserve only one config
* minor change
* fix minor bug
* minor change
* minor change
* add unittests and fix docstrings
* support thr
* replace thrs with thr
* fix docstring
* minor change
* revise according to comments
* revised according to comments
* revise according to comments
* rewrite basedataset.evaluate to avoid duplicate calculation
* minor change
* change thr to thrs
* add more unit test
* support support, support class-wise evaluation results and move eval_metrics.py
* Fix docstring
* change average to be non-optional
* revise according to comments
* add more unittest
* add macro-averaged precision,recall,f1 options in evaluation
* remove unnecessary comments
* Revise according to comments
* Revise according to comments