Model Training Patterns - Hyperparameter Tuning
Manoj Kumar Patra
Posted on January 2, 2022
Model parameters vs Hyperparameters
Model parameters refer to the weights and biases learned by the model as it goes through training iterations.
Hyperparameters are, on the other hand, parameters that we as model builders can control.
Types of hyperparameters
Model architecture hyperparameters - Hyperparameters that control model's underlying mathematical function
Model training hyperparameters - Hyperparameters that control the training loop and the way the optimizer works
Finding the best possible values for hyperparameters
Grid search
- Define a set of values for each hyperparameter that you want to optimize
- Use grid search - it will try every combination of the specified values and return the combination that results in the best evaluation metric for the model
Problems with this approach
- As the number of hyperparameters and values for each hyperparameter increases, the number of combinations increase and the time required to try them all increases => combinatorial explosion.
- It's a brute force solution => it doesn't learn. It will continue trying the combinations even after reaching a certain threshold, say, we reach a point where the error starts increasing instead of decreasing.
Randomized search
A faster alternative to grid search.
Unlike grid search, this approach will randomly sample values for each hyperparameter and try the combination.
- Define range of values for each hyperparameter that you want to optimize
- Mention number of times you would want to randomly sample values for each hyperparameter
- Use random search
keras-tuner
library
This library provides solution that scales and learns from previous trials to find an optimal combination of hyperparameter values.
EXAMPLE - tuning the number of neurons in the first and second hidden layers of a MNIST classification model
import keras_tuner as kt
from tensorflow import keras
def build_model(hp):
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(units=hp.Int('first_hidden', min_value=32,
max_value=256, step=32), activation='relu'),
keras.layers.Dense(units=hp.Int('second_hidden', min_value=32,
max_value=256, step=32), activation='relu'),
keras.layers.Dense(units=10, activation='softmax')
])
model.compile(optimizer=keras.optimizers.Adam(
hp.Float('learning_rate', min_value=.005, max_value=.01, sampling='log')),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
tuner = kt.BayesianOptimization(
build_model,
objective='val_accuracy',
max_trials=10,
)
tuner.search(x_train, y_train, validation_split=0.1, epochs=10)
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
Bayesian Optimization (B.O.)
Goal of this optimization approach - Directly train the model or call the objective function (the process of training the ML model) as few times as possible as it's a costly operation
One of the issues with the above approaches is that every time a new set of hyperparameters is tried on, it means running the model through an entire training loop. This is what Bayesian optimization tries to solve.
How this works?
- Choose hyperparameters that need optimization
- Define a range of values for these hyperparameters
- Define the objective function
- Bayesian optimization uses this objective function to create a new function that emulates our model and is much cheaper to run (surrogate function)
- Surrogate function is used by B.O. to find the best combination of hyperparameters
- Once the best combination is found, model is run through a full training loop using these values
- The results post training are fed back into the surrogate function and the process is repeated for
number_of_trials
Posted on January 2, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.