2 years ago

#50468

test-img

Swann_

gridsearchcv best_estimator parameter doenst have same value as the fitted model when using pipeline indexing. Also uses sequential feature selection

The whole idea is to perform a grid search over all possible values of lambda, where each possible values of lambda would give a specific best subset of feature. At The end of the day I'm trying to do hyperparameter tuning (lambda) and feature selection at the same time. any advice is greatly appreciated! thankyou so much

ISSUE :

  1. result of gs_cv.best_estimator_[0].estimator.alpha while gs_cv.best_estimator_[1].alpha = 1.0 (pipeline indexing results)

  2. best_parameter from the grid_search_cv doesnt seem to be fitted to the model part of the pipeline as seen in the image.

  3. I got this when print(gs_cv.best_estimator_.named_steps). The Ridge() still uses the default value lambda of 1

    {'sfs_ridge': SequentialFeatureSelector(estimator=Ridge(alpha=0.0), k_features=5, scoring='r2'), 'ridge_regression': Ridge()}

------------Code------------------

import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.model_selection import train_test_split

from sklearn.datasets import load_diabetes

diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

#Model
ridge = Ridge()
#hyperparameter_alpha = np.logspace(-6,6, num=5)

#SFS model
sfs_ridge = SFS(estimator=ridge, k_features = 5, forward=True, floating=False, scoring='r2', cv = 5)

#Pipeline model
pipe = Pipeline([ ('sfs_ridge', sfs_ridge), ('ridge_regression', ridge)  ])

#GridSearchCV
#The parameter_grid for the model should start with the name you give when defining the pipeline!!
param_grid =   [ {'sfs_ridge__k_features': [2,4,5] ,'sfs_ridge__estimator__alpha': np.arange(0,1,0.05) }]
                 
     

gs_cv = GridSearchCV(estimator= pipe, param_grid= param_grid, scoring="neg_mean_absolute_error", n_jobs = -1, cv=5, refit=True)
gs_cv.fit(X_train, y_train)

print(gs_cv.best_estimator_[0].estimator.alpha)  #print out 0.0
print(gs_cv.best_estimator_[1].alpha)          #print out 1.0

print(gs_cv.best_estimator_[0].k_feature_idx_)

python

pipeline

feature-selection

grid-search

hyperparameters

0 Answers

Your Answer

Accepted video resources