* enable compile configurations to support torch.compile in Runner
* enable compilation in train, val and test
* fix as comments
* add docstring to illustrate usage
* minor refine error message
* add unittests
* fix ut skip
* add logging message to inform users
* compile `train_step`, `val_step`, `test_step` instead
* fix as comments
* revert to compile `train_step` only due to pt2 issue
* add documentation about torch.compile
* support bf16 in AmpOptimWrapper
* add docstring
* modify docs
* add unittests for bf16 in AmpOptimWrapper
* fix type
* fix to pass ci
* fix ut skip logic to pass ci
* fix as comment
* add type hints
* fix docstring and add warning information
* remove check for pytorch>=1.6 in unittest
* modify unittest
* modify unittest
* remove torch.float32 && torch.float64 from valid dtypes
* fix as comments
* minor refine docstring
* fix unittest parameterized to pass CI
* fix unittest && add back torch.float32, torch.float64