Monday, July 12, 2021

Problems with Sigmoid and Tanh activation functions

The Sigmoid activation function is also known as the Logistic function. The input to the function is transformed into a value between 0 and 1.

The Hyperbolic Tangent, also known as Tanh, is a similar shaped nonlinear activation function that outputs values range from -1.0 to 1.0 (instead of 0 to 1 in the case of Sigmoid function).

Problem: Vanishing Gradient Problem

A general problem with both the Sigmoid and Tanh functions is vanishing gradients. Looking at the function plot, you can see that when inputs become small or large, the Sigmoid function saturates at 0 or 1, and the Tanh function saturates at -1 and 1, with a derivative extremely close to 0. Thus it has almost no gradient to propagate back through the network, so there is almost nothing left for lower layers. This problem prevents network models from learning effectively, especially in deep networks.


Saturday, July 10, 2021

Backprop: What you need to know

 1. Gradients are important: 

- If it's differentiable, we can probably learn on it.

2. Gradients can vanish:

- Each additional layer can successively reduce signal vs noise.

- ReLus are useful here.

3. Gradients can explode:

- Learning rates are important here.

- Batch normalisation can help.

4. ReLu layers can die:

- Keep calm and lower your learning rates.

Source : Machine Learning crash course by Google.

