* minor changes
* change to modulist
* change to Sequential
* replace dropout with attn_drop and proj_drop in MultiheadAttention
* add operation_name for attn
* add drop path and move all ffn args to ffncfgs
* fix typo
* fix a bug when use default value of ffn_cfgs
* fix ffns
* add deprecate warning
* fix deprecate warning
* change to pop kwargs
* support register FFN of transformer
* support batch first
* fix batch first wapper
* fix forward wapper
* fix typo
* fix lint
* add unitest for transformer
* fix unitest
* fix equal
* use allclose
* fix comments
* fix comments
* change configdict to dict
* move drop to a file
* add comments for drop path
* add noqa 501
* move bnc wapper to MultiheadAttention
* move bnc wapper to MultiheadAttention
* use dep warning
* resolve comments
* add unitest:
* rename residual to identity
* revert runner
* msda residual to identity
* rename inp_identity to identity
* fix name
* fix transformer
* remove key in msda
* remove assert for key
Co-authored-by: HIT-cwh <2892770585@qq.com>
Co-authored-by: bkhuang <congee524@gmail.com>
Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>