Building a Custom Grid Search for Your Custom Model

Bünyamin Ergen
7 min readApr 18, 2023

--

Grid Search 3D

Introduction

In the ever-evolving world of data science, custom projects are becoming increasingly common. These unique tasks can be both exciting and challenging, requiring custom functions and code to tackle the problems at hand. One essential aspect of these projects is optimizing the model’s parameters, which is where a custom grid search comes into play. In this article, we will delve into the logic of grid search and demonstrate how to build a custom grid search for your own use.

Understanding Grid Search Logic

Grid search is a process that repeatedly trains a model with different combinations of parameters to find the optimal hyperparameters for the desired model. For instance, if you have a Random Forest Regressor model and want to determine the best hyperparameters, the grid search will create a combination of the given parameters:

grid_search_params = {
'n_estimators': [10, 20],
'max_depth': [40, 50],
'min_samples_split': [2, 3],
'min_samples_leaf': [4, 5]
}

Combinations of Parameters:

({'n_estimators': 10, 'max_depth': 40, 'min_samples_split': 2, 'min_samples_leaf': 4}, 
({'n_estimators': 10, 'max_depth': 40, 'min_samples_split': 2, 'min_samples_leaf': 5}, )
({'n_estimators': 10, 'max_depth': 40, 'min_samples_split': 3, 'min_samples_leaf': 4}, )
({'n_estimators': 10, 'max_depth': 40, 'min_samples_split': 3, 'min_samples_leaf': 5}, )
({'n_estimators': 10, 'max_depth': 50, 'min_samples_split': 2, 'min_samples_leaf': 4}, )
({'n_estimators': 10, 'max_depth': 50, 'min_samples_split': 2, 'min_samples_leaf': 5}, )
({'n_estimators': 10, 'max_depth': 50, 'min_samples_split': 3, 'min_samples_leaf': 4}, )
({'n_estimators': 10, 'max_depth': 50, 'min_samples_split': 3, 'min_samples_leaf': 5}, )
({'n_estimators': 20, 'max_depth': 40, 'min_samples_split': 2, 'min_samples_leaf': 4}, )
({'n_estimators': 20, 'max_depth': 40, 'min_samples_split': 2, 'min_samples_leaf': 5}, )
({'n_estimators': 20, 'max_depth': 40, 'min_samples_split': 3, 'min_samples_leaf': 4}, )
({'n_estimators': 20, 'max_depth': 40, 'min_samples_split': 3, 'min_samples_leaf': 5}, )
({'n_estimators': 20, 'max_depth': 50, 'min_samples_split': 2, 'min_samples_leaf': 4}, )
({'n_estimators': 20, 'max_depth': 50, 'min_samples_split': 2, 'min_samples_leaf': 5}, )
({'n_estimators': 20, 'max_depth': 50, 'min_samples_split': 3, 'min_samples_leaf': 4}, )
({'n_estimators': 20, 'max_depth': 50, 'min_samples_split': 3, 'min_samples_leaf': 5}, ))

The grid search will then train the model using all possible parameter combinations to identify the best one. However, understanding the logic is not enough; you need to implement it effectively in your projects.

Enhancing the Grid Search Process

To improve the grid search process, follow these steps:

  1. Utilize itertools to match all parameter combinations.
  2. Calculate error metrics for evaluation, using custom functions if needed.
  3. Implement k-fold cross-validation for a more robust evaluation.
  4. Train the model with all parameter combinations and utilize parallel programming for efficient computation. Include an option to stop the process at any time with KeyboardInterrupt.
  5. Return all combination results, best parameters, and scores.

Implementation and Usage

The code for building a custom grid search is available in a Github gist linked in the comments below. Although the author’s custom model is not used as an example due to confidentiality, you can easily apply this custom grid search function to any dataset with parameters, actual, and predicted values.

This custom grid search is compatible with scikit-learn models and can be adapted for other projects. Although scikit-learn does provide options for custom grid search, creating your own can be a valuable learning experience and can offer greater flexibility.

Conclusion

Creating a custom grid search for your unique model is an essential step in optimizing your model’s parameters. By understanding the logic behind grid search and implementing it effectively, you can enhance the performance of your model and tackle complex projects with confidence. So go ahead and apply this custom grid search approach to your own projects, and enjoy the challenge of conquering new data science tasks!

Complete Code:

import sy
import os
import itertools
import numpy as np
from concurrent.futures import ThreadPoolExecutor
from sklearn.ensemble import RandomForestRegressor

def custom_grid_search(Model, X, y, GridSearchParams, cv=5, ParallelUnits=1, error_metrics="mse", random_state=None):

"""
Perform a grid search on the given model using the specified parameters,
returning the results, the best parameters, and the best score.

Parameters
----------
Model : sklearn model
The model to perform the grid search on
X : array-like, shape (n_samples, n_features)
The training input samples
y : array-like, shape (n_samples,)
The target values
GridSearchParams : dict
The parameters to search over, with keys as the parameter names and
values as a list of possible values
cv : int, optional (default=5)
The number of cross-validation splits to perform
ParallelUnits : int, optional (default=1)
The number of parallel workers to use. If -1, use all available CPU cores
error_metrics : str, optional (default="mse")
The error metric to use when determining the best parameters. Can be
one of "mae", "mse", "rmse", or "r2"
random_state : int, optional (default=None)
Seed for random number generator

Returns
-------
results : list
A list of tuples, each containing a dictionary of parameters and the
corresponding average error
best_params : dict
The best parameters found
error : float
The best score found

"""

# Function to split the data into k-folds for cross-validation
def k_fold_split(X, y, n_splits, random_state=None):

"""
Split the data into train and test sets for k-fold cross-validation.

Parameters
----------
X : array-like, shape (n_samples, n_features)
The input data
y : array-like, shape (n_samples,)
The target data
n_splits : int
The number of folds to split the data into
random_state : int, optional (default=None)
Seed for random number generator

Returns
-------
generator
A generator that yields the indices of the train and test sets for
each fold

"""

# Create an array of indices
indices = np.arange(len(y))

# Set the random seed if specified
if random_state is not None:
np.random.seed(random_state)

# Shuffle the indices
np.random.shuffle(indices)

# Calculate the size of each fold
fold_sizes = np.full(n_splits, len(y) // n_splits, dtype=int)
fold_sizes[:len(y) % n_splits] += 1

# Initialize the start position
current = 0

# Split the data into train and test sets for k-fold cross-validation
for fold_size in fold_sizes:
start, stop = current, current + fold_size
test_indices = indices[start:stop]
train_indices = np.concatenate((indices[:start], indices[stop:]))
yield train_indices, test_indices

# Mean absolute error calculation function
def mean_absolute_error(y_true, y_pred):

"""
Calculate the mean absolute error between two arrays of target values.

Parameters
----------
y_true : array-like, shape (n_samples,)
The true target values
y_pred : array-like, shape (n_samples,)
The predicted target values

Returns
-------
float
The mean absolute error

"""

return sum(abs(yt - yp) for yt, yp in zip(y_true, y_pred)) / len(y_true)

# Mean squared error calculation function
def mean_squared_error(y_true, y_pred):

"""
Calculate the mean squared error between two arrays of target values.

Parameters
----------
y_true : array-like, shape (n_samples,)
The true target values
y_pred : array-like, shape (n_samples,)
The predicted target values

Returns
-------
float
The mean squared error

"""

return sum((yt - yp) ** 2 for yt, yp in zip(y_true, y_pred)) / len(y_true)

# Root mean squared error calculation function
def root_mean_squared_error(y_true, y_pred):

"""
Calculate the root mean squared error between two arrays of target values.

Parameters
----------
y_true : array-like, shape (n_samples,)
The true target values
y_pred : array-like, shape (n_samples,)
The predicted target values

Returns
-------
float
The root mean squared error

"""

return (mean_squared_error(y_true, y_pred)) ** 0.5

# R^2 score calculation function
def r2_score(y_true, y_pred):

"""
Calculate the R^2 score between two arrays of target values.

Parameters
----------
y_true : array-like, shape (n_samples,)
The true target values
y_pred : array-like, shape (n_samples,)
The predicted target values

Returns
-------
float
The R^2 score

"""

mean_true = sum(y_true) / len(y_true)
total_var = sum((yt - mean_true) ** 2 for yt in y_true)
residual_var = sum((yt - yp) ** 2 for yt, yp in zip(y_true, y_pred))
return 1 - (residual_var / total_var)

# Dictionary mapping error metric names to their corresponding functions
error_metrics_dict = {
"mae": mean_absolute_error,
"mse": mean_squared_error,
"rmse": root_mean_squared_error,
"r2": r2_score
}

# Train the model using the specified parameters and calculate the error
def process_param_set(args, Model, error_metrics_dict):

"""
Train the model using the specified parameters and calculate the error.

Parameters
----------
args : tuple
A tuple containing the parameters, input data, target data, error metric,
train indices, and test indices
Model : custom model
The model to train
error_metrics_dict : dict
A dictionary mapping error metric names to their corresponding functions

Returns
-------
tuple
A tuple of the parameters and the calculated error

"""

params, X, y, error_metrics, train_indices, test_indices = args

# Split the data into training and testing set using the indices
X_train, X_test = X[train_indices], X[test_indices]
y_train, y_test = y[train_indices], y[test_indices]

# Train the model using the specified parameters
model = Model(**params)
model.fit(X_train, y_train)

# Predict using the trained model
y_pred = model.predict(X_test)

# Calculate the error using the specified error metric
error = error_metrics_dict[error_metrics](y_test, y_pred)

return params, error

# Initialize variables to store the results
Results = []
Keys = GridSearchParams.keys()
MinError = sys.maxsize
MaxScore = -sys.maxsize
BestParams = {}
ParamSets = [dict(zip(Keys, values)) for values in itertools.product(*GridSearchParams.values())]

# Check the number of parallel units
if ParallelUnits == -1:
ParallelUnits = os.cpu_count()
elif ParallelUnits > os.cpu_count():
raise RuntimeError(f"Maksimum CPU sayınız: {os.cpu_count()}")

# Start the thread pool executor
try:
with ThreadPoolExecutor(max_workers=ParallelUnits) as executor:
for params in ParamSets:
total_error = 0
futures = []

for train_indices, test_indices in k_fold_split(X, y, cv, random_state):
future = executor.submit(process_param_set,
(params, X, y, error_metrics, train_indices, test_indices),
Model, error_metrics_dict)
futures.append(future)

for future in futures:
_, error = future.result()
total_error += error

avg_error = total_error / cv
Results.append((params, avg_error))

if error_metrics == "r2":
if avg_error > MaxScore:
MaxScore = avg_error
BestParams = params
else:
if avg_error < MinError:
MinError = avg_error
BestParams = params

# Handle keyboard interrupt to stop the execution
except KeyboardInterrupt:
print("CTRL+C algılandı. İşlem sonlandırılıyor...")
executor.shutdown(wait=False)

# Return the results, best parameters and error
return Results, BestParams, MaxScore if error_metrics == "r2" else MinError

# Simple Dataset
np.random.seed(42)
X = np.random.rand(100, 2)
y = 3 * X[:, 0] + 2 * X[:, 1] + np.random.randn(100)

# Parameters
grid_search_params = {
'n_estimators': [10, 20],
'max_depth': [40, 50],
'min_samples_split': [2, 3],
'min_samples_leaf': [4, 5]
}

# Searching Best Parameters
results, best_params, error = custom_grid_search(RandomForestRegressor, X, y, grid_search_params, cv=5, ParallelUnits=-1, error_metrics="mae", random_state=42)

# Print results
print("Results:")
for res in results:
print(res)

print("Best Parameters:", best_params)
print("Error:", error)

Bunyamin Ergen

🌐 www.bunyaminergen.com

linkedin.com/bunyaminergen
github.com/bunyaminergen
kaggle.com/bunyaminergen
instagram.com/bunyaminergen
facebook.com/bunyaminergenoffical
twitter.com/bergenoffical
youtube.com/bunyaminergen

--

--