Defaults to "gelu", but makes it possible to pass "gelu_tanh". Makes it easier to port weights from JAX/Flax, where the tanh approximation is the default. |
||
---|---|---|
.. | ||
data | ||
layers | ||
loss | ||
models | ||
optim | ||
scheduler | ||
utils | ||
__init__.py | ||
version.py |
Defaults to "gelu", but makes it possible to pass "gelu_tanh". Makes it easier to port weights from JAX/Flax, where the tanh approximation is the default. |
||
---|---|---|
.. | ||
data | ||
layers | ||
loss | ||
models | ||
optim | ||
scheduler | ||
utils | ||
__init__.py | ||
version.py |