About Me

My photo
Hi! I am Rahul. After years of bouncing around different sectors, I've been specializing in Python, Machine Learning, Deep Learning, NLP, and Statistics. Being a Technology Lover, I strongly believe 'Nothing can stress or stop you from achieving your dreams if you cherish hope more than your fears.'

Thursday, June 24, 2021

Few rules of thumb for Hyperparameter Tuning

Most machine learning problems require a lot of hyperparameter tuning. Unfortunately, we can't provide concrete tuning rules for every model. Lowering the learning rate can help one model converge efficiently but make another model converge much too slowly. You must experiment to find the best set of hyperparameters for your dataset. That said, here are a few rules of thumb:

  • Training loss should steadily decrease, steeply at first, and then more slowly until the slope of the curve reaches or approaches zero.
  • If the training loss does not converge, train for more epochs.
  • If the training loss decreases too slowly, increase the learning rate. Note that setting the learning rate too high may also prevent training loss from converging.
  • If the training loss varies wildly (that is, the training loss jumps around), decrease the learning rate.
  • Lowering the learning rate while increasing the number of epochs or the batch size is often a good combination.
  • Setting the batch size to a very small batch number can also cause instability. First, try large batch size values. Then, decrease the batch size until you see degradation.
  • For real-world datasets consisting of a very large number of examples, the entire dataset might not fit into memory. In such cases, you'll need to reduce the batch size to enable a batch to fit into memory.

Remember: the ideal combination of hyperparameters is data-dependent, so you must always experiment and verify.


Source: Machine Learning Crash Course by Google

No comments:

Problems with Sigmoid and Tanh activation functions

The Sigmoid activation function is also known as the Logistic function . The input to the function is transformed into a value between 0 an...