The Sigmoid activation function is also known as the Logistic function. The input to the function is transformed into a value between 0 and 1.
The Hyperbolic Tangent, also known as Tanh, is a similar shaped nonlinear activation function that outputs values range from -1.0 to 1.0 (instead of 0 to 1 in the case of Sigmoid function).
Problem: Vanishing Gradient Problem
A general problem with both the Sigmoid and Tanh functions is vanishing gradients. Looking at the function plot, you can see that when inputs become small or large, the Sigmoid function saturates at 0 or 1, and the Tanh function saturates at -1 and 1, with a derivative extremely close to 0. Thus it has almost no gradient to propagate back through the network, so there is almost nothing left for lower layers. This problem prevents network models from learning effectively, especially in deep networks.
No comments:
Post a Comment