Activation functions in PyTorch (3)
Super Kai (Kazuya Ito)
Posted on October 5, 2024
*Memos:
- My post explains PReLU() and ELU().
- My post explains SELU() and CELU().
- My post explains Step function, Identity and ReLU.
- My post explains Leaky ReLU, PReLU and FReLU.
- My post explains GELU, Mish, SiLU and Softplus.
- My post explains Tanh, Softsign, Sigmoid and Softmax.
- My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
- My post explains layers in PyTorch.
- My post explains loss functions in PyTorch.
- My post explains optimizers in PyTorch.
(1) ELU(Exponential Linear Unit):
- can convert an input value(
x
) to the output value betweenae
x
-a
andx
: *Memos:- If
x
< 0, thenae
x
-a
while if 0 <=x
, thenx
. -
a
is 1.0 by default basically.
- If
- is ELU() in PyTorch.
- 's pros:
- It normalizes negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- 's cons:
- It's computationally expensive because of exponential operation.
- It's non-differentiable at
x = 0
ifa
is not 1.
- 's graph in Desmos:
(2) SELU(Scaled Exponential Linear Unit):
- can convert an input value(
x
) to the output value betweenλ
(aex
- a) andλx
: *Memos:- If
x
< 0, thenλ
(ae
x
-a
) while if 0 <=x
, thenλx
. - λ=1.0507009873554804934193349852946
- α=1.6732632423543772848170429916717
- If
- is SELU() in PyTorch.
- 's pros:
- It normalises negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- 's cons:
- It may cause Exploding Gradient Problem because a positive input value is increased by the multiplication with λ.
- It's computationally expensive because of exponential operation.
- It's non-differentiable at
x = 0
ifa
is not 1.
- 's graph in Desmos:
(3) CELU(Continuously Differentiable Exponential Linear Unit):
- is improved ELU, being able to differentiate at
x = 0
even ifa
is not 1. - can convert an input value(
x
) to the output value between aex/a
- a andx
: *Memos:- If
x
< 0, thenae
x/a
-a
while if 0 <=x
, thenx
. -
a
is 1.0 by default basically.
- If
- 's formula is:
- is CELU() in PyTorch.
- 's pros:
- It normalises negative input values.
- The convergence with negative input values is stable.
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- 's cons:
- It's computationally expensive because of exponential operation.
- 's graph in Desmos:
💖 💪 🙅 🚩
Super Kai (Kazuya Ito)
Posted on October 5, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.