About Me

My photo
Hi! I am Rahul. After years of bouncing around different sectors, I've been specializing in Python, Machine Learning, Deep Learning, NLP, and Statistics. Being a Technology Lover, I strongly believe 'Nothing can stress or stop you from achieving your dreams if you cherish hope more than your fears.'

Friday, April 2, 2021

Why do we need activation functions?

### 0001

We used a lot of activation functions in our daily projects but do we really know why do we need an activation function in the first place?

Reason 1:

Well, if you chain several linear transformations, all you get is a linear transformation. 

For example, if f(x) = 2x + 3  and g(x) = 5x - 1, then chaining these two linear functions gives you another linear function: f(g(x)) = 2(5x-1)+3 = 10x + 1. So if you don't have some nonlinearity between layers, then even a deep stack of layers is equivalent to a single layer and you can't solve very complex problem with that. 

Reason 2:

If you want to guarantee that the output will always be positive, then you can use the ReLU activation function in the output layer. Alternatively, you can use the "softplus" activation function, which is a smooth variant of ReLU: softplus(z) = log(1+exp(z)). It is close to 0 when z is negative and close to z when z is positive. 

Finally, if you want to guarantee that the predictions will fall within a given range of values, then you can use the logistic function or the hyperbolic tangents and then scale the labels to the appropriate range: 0 to 1 for the logistic function and -1 to 1 for the hyperbolic tangents.


Source: Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow, Chapter-10

No comments:

Problems with Sigmoid and Tanh activation functions

The Sigmoid activation function is also known as the Logistic function . The input to the function is transformed into a value between 0 an...