Into Data Science: Problems with Sigmoid and Tanh activation functions

Monday, July 12, 2021

Problems with Sigmoid and Tanh activation functions

The Sigmoid activation function is also known as the Logistic function. The input to the function is transformed into a value between 0 and 1.

The Hyperbolic Tangent, also known as Tanh, is a similar shaped nonlinear activation function that outputs values range from -1.0 to 1.0 (instead of 0 to 1 in the case of Sigmoid function).

Problem: Vanishing Gradient Problem

A general problem with both the Sigmoid and Tanh functions is vanishing gradients. Looking at the function plot, you can see that when inputs become small or large, the Sigmoid function saturates at 0 or 1, and the Tanh function saturates at -1 and 1, with a derivative extremely close to 0. Thus it has almost no gradient to propagate back through the network, so there is almost nothing left for lower layers. This problem prevents network models from learning effectively, especially in deep networks.

Source : Why Rectified Linear Unit (ReLU) in Deep Learning and the best practice to use it with TensorFlow | by B. Chen | Towards Data Science

Into Data Science

About Me

Monday, July 12, 2021

Problems with Sigmoid and Tanh activation functions

No comments:

Problems with Sigmoid and Tanh activation functions

Report Abuse