1. Gradients are important:
- If it's differentiable, we can probably learn on it.
2. Gradients can vanish:
- Each additional layer can successively reduce signal vs noise.
- ReLus are useful here.
3. Gradients can explode:
- Learning rates are important here.
- Batch normalisation can help.
4. ReLu layers can die:
- Keep calm and lower your learning rates.
Source : Machine Learning crash course by Google.
No comments:
Post a Comment