2 years ago
#50468
Swann_
gridsearchcv best_estimator parameter doenst have same value as the fitted model when using pipeline indexing. Also uses sequential feature selection
The whole idea is to perform a grid search over all possible values of lambda, where each possible values of lambda would give a specific best subset of feature. At The end of the day I'm trying to do hyperparameter tuning (lambda) and feature selection at the same time. any advice is greatly appreciated! thankyou so much
ISSUE :
result of gs_cv.best_estimator_[0].estimator.alpha while gs_cv.best_estimator_[1].alpha = 1.0 (pipeline indexing results)
best_parameter from the grid_search_cv doesnt seem to be fitted to the model part of the pipeline as seen in the image.
I got this when print(gs_cv.best_estimator_.named_steps). The Ridge() still uses the default value lambda of 1
{'sfs_ridge': SequentialFeatureSelector(estimator=Ridge(alpha=0.0), k_features=5, scoring='r2'), 'ridge_regression': Ridge()}
------------Code------------------
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
#Model
ridge = Ridge()
#hyperparameter_alpha = np.logspace(-6,6, num=5)
#SFS model
sfs_ridge = SFS(estimator=ridge, k_features = 5, forward=True, floating=False, scoring='r2', cv = 5)
#Pipeline model
pipe = Pipeline([ ('sfs_ridge', sfs_ridge), ('ridge_regression', ridge) ])
#GridSearchCV
#The parameter_grid for the model should start with the name you give when defining the pipeline!!
param_grid = [ {'sfs_ridge__k_features': [2,4,5] ,'sfs_ridge__estimator__alpha': np.arange(0,1,0.05) }]
gs_cv = GridSearchCV(estimator= pipe, param_grid= param_grid, scoring="neg_mean_absolute_error", n_jobs = -1, cv=5, refit=True)
gs_cv.fit(X_train, y_train)
print(gs_cv.best_estimator_[0].estimator.alpha) #print out 0.0
print(gs_cv.best_estimator_[1].alpha) #print out 1.0
print(gs_cv.best_estimator_[0].k_feature_idx_)
python
pipeline
feature-selection
grid-search
hyperparameters
0 Answers
Your Answer