Ma Zerun b017670e1b
[Improve] Use PyTorch official scaled_dot_product_attention to accelerate MultiheadAttention. (#1434)
* [Improve] Use PyTorch official `scaled_dot_product_attention` to accelerate `MultiheadAttention`.

* Support `--local-rank` and `--amp` option for new version PyTorch.

* Fix imports and UT.
2023-03-29 15:50:44 +08:00
..