Into Data Science: Backprop: What you need to know

Saturday, July 10, 2021

1. Gradients are important:

- If it's differentiable, we can probably learn on it.

2. Gradients can vanish:

- Each additional layer can successively reduce signal vs noise.

- ReLus are useful here.

3. Gradients can explode:

- Learning rates are important here.

- Batch normalisation can help.

4. ReLu layers can die:

- Keep calm and lower your learning rates.

Source : Machine Learning crash course by Google.