* support bf16 in AmpOptimWrapper
* add docstring
* modify docs
* add unittests for bf16 in AmpOptimWrapper
* fix type
* fix to pass ci
* fix ut skip logic to pass ci
* fix as comment
* add type hints
* fix docstring and add warning information
* remove check for pytorch>=1.6 in unittest
* modify unittest
* modify unittest
* remove torch.float32 && torch.float64 from valid dtypes
* fix as comments
* minor refine docstring
* fix unittest parameterized to pass CI
* fix unittest && add back torch.float32, torch.float64