Into Data Science

Monday, July 12, 2021

Problems with Sigmoid and Tanh activation functions

The Sigmoid activation function is also known as the Logistic function. The input to the function is transformed into a value between 0 and 1.

The Hyperbolic Tangent, also known as Tanh, is a similar shaped nonlinear activation function that outputs values range from -1.0 to 1.0 (instead of 0 to 1 in the case of Sigmoid function).

Problem: Vanishing Gradient Problem

A general problem with both the Sigmoid and Tanh functions is vanishing gradients. Looking at the function plot, you can see that when inputs become small or large, the Sigmoid function saturates at 0 or 1, and the Tanh function saturates at -1 and 1, with a derivative extremely close to 0. Thus it has almost no gradient to propagate back through the network, so there is almost nothing left for lower layers. This problem prevents network models from learning effectively, especially in deep networks.

Source : Why Rectified Linear Unit (ReLU) in Deep Learning and the best practice to use it with TensorFlow | by B. Chen | Towards Data Science

Saturday, July 10, 2021

Backprop: What you need to know

1. Gradients are important:

- If it's differentiable, we can probably learn on it.

2. Gradients can vanish:

- Each additional layer can successively reduce signal vs noise.

- ReLus are useful here.

3. Gradients can explode:

- Learning rates are important here.

- Batch normalisation can help.

4. ReLu layers can die:

- Keep calm and lower your learning rates.

Source : Machine Learning crash course by Google.

Thursday, June 24, 2021

Few rules of thumb for Hyperparameter Tuning

Most machine learning problems require a lot of hyperparameter tuning. Unfortunately, we can't provide concrete tuning rules for every model. Lowering the learning rate can help one model converge efficiently but make another model converge much too slowly. You must experiment to find the best set of hyperparameters for your dataset. That said, here are a few rules of thumb:

Training loss should steadily decrease, steeply at first, and then more slowly until the slope of the curve reaches or approaches zero.
If the training loss does not converge, train for more epochs.
If the training loss decreases too slowly, increase the learning rate. Note that setting the learning rate too high may also prevent training loss from converging.
If the training loss varies wildly (that is, the training loss jumps around), decrease the learning rate.
Lowering the learning rate while increasing the number of epochs or the batch size is often a good combination.
Setting the batch size to a very small batch number can also cause instability. First, try large batch size values. Then, decrease the batch size until you see degradation.
For real-world datasets consisting of a very large number of examples, the entire dataset might not fit into memory. In such cases, you'll need to reduce the batch size to enable a batch to fit into memory.

Remember: the ideal combination of hyperparameters is data-dependent, so you must always experiment and verify.

Source: Machine Learning Crash Course by Google

Sunday, April 11, 2021

Advantages & Disadvantages of Sequential and Functional APIs

Both Sequential APIs and Functional APIs are declarative

It means you start it by declaring which layers you want to use and how they should be connected, and only then can you start feeding the model some data for training or inference.

So advantages of having this:

The model can easily be saved, cloned, and shared.
Its structure can be displayed and analyzed.
The framework can infer shapes and check types, so errors can be caught easily (i.e. before any data ever goes through the model).
It's also fairly easy to debug since the whole model is a static graph of layers.

But the flip side is just that: it's static.

Some models involve loops, varying shapes, conditional branches, and other dynamic behavior.

For such cases, the Subclassing API is for you.

Source: Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow

Friday, April 2, 2021

Why do we need activation functions?

### 0001

We used a lot of activation functions in our daily projects but do we really know why do we need an activation function in the first place?

Reason 1:

Well, if you chain several linear transformations, all you get is a linear transformation.

For example, if f(x) = 2x + 3 and g(x) = 5x - 1, then chaining these two linear functions gives you another linear function: f(g(x)) = 2(5x-1)+3 = 10x + 1. So if you don't have some nonlinearity between layers, then even a deep stack of layers is equivalent to a single layer and you can't solve very complex problem with that.

Reason 2:

If you want to guarantee that the output will always be positive, then you can use the ReLU activation function in the output layer. Alternatively, you can use the "softplus" activation function, which is a smooth variant of ReLU: softplus(z) = log(1+exp(z)). It is close to 0 when z is negative and close to z when z is positive.

Finally, if you want to guarantee that the predictions will fall within a given range of values, then you can use the logistic function or the hyperbolic tangents and then scale the labels to the appropriate range: 0 to 1 for the logistic function and -1 to 1 for the hyperbolic tangents.

Source: Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow, Chapter-10

Thursday, March 11, 2021

Atomic Habits - Bullet key summary

Source: https://images.app.goo.gl/x4FKRsTHtgAHeEx27

Designing your environment for success:

Make your desired changing things visible in your work/home environment.

The TWO-minute rule to start a habit:

Start any desired habit by dividing it into segments and complete segments which require your two minutes only.

Master the entry points (or first step) of habit.
Join the community whom normal habit is your desired habit.
Use variable rewards:

Most bad habits have immediate rewards and long-term consequences. Most good habits are the exact opposite.
Create a reward system.

Tuesday, January 5, 2021

Begin Journey with me to become Data Scientist

Hi Reader,

This is my first blog in the field of Data Science and Machine Learning. Before going any further, let me introduce myself. I'm Rahul Kumar, born and live in Delhi, India. I have done my B. Tech in Mechanical Engineering from Delhi Technological University and completed my M.Tech in the field of Renewable Energy from Indian Institute of Technology Roorkee.

Currently, I have started working in a Company whose ideology is to forecast solar generation, wind generation, and Load generation using Machine Learning Algorithms.

Before joining the above organization I learned basic python, Intro to machine learning, and Statistics. By the way, I am still learning but now the learning curve grows exponentially with time.

Let's see our journey with a sigmoid curve, I am at the bottom with the knowledge of Python, Machine Learning, and Basic Statistics.

We have to reach the top in a very short period of time. For me, it's 1-year complete dedicated to this.

So let's start this journey with me. I'll keep posting blogs on every learning.

I am attaching a google sheet where all the resource link is present. I'm requesting you to add more to that.

- Rahul Kumar

Links:

1. https://docs.google.com/spreadsheets/d/1glbDGgU46JZtlqNaX6AiQRx7saj8mWoLoGbXwhtjsIA/edit?usp=sharing

Into Data Science

About Me

Monday, July 12, 2021

Problems with Sigmoid and Tanh activation functions

Saturday, July 10, 2021

Backprop: What you need to know

Thursday, June 24, 2021

Few rules of thumb for Hyperparameter Tuning

Sunday, April 11, 2021

Advantages & Disadvantages of Sequential and Functional APIs

Friday, April 2, 2021

Why do we need activation functions?

Thursday, March 11, 2021

Atomic Habits - Bullet key summary

Tuesday, January 5, 2021

Begin Journey with me to become Data Scientist

Problems with Sigmoid and Tanh activation functions

Report Abuse