I am using the diabetes dataset from sklearn.
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=263)
from sklearn.linear_model import Lasso
lasso = Lasso().fit(X_train, y_train)
import numpy as np
np.sum(lasso.coef_ != 0)
I split the dataset then train my Lasso model using the training datasets. My last print statement returns how many features the model uses. How can i define the names of these features in sklearn/ Python?
CodePudding user response:
You can get the feature names of the diabetes dataset using diabetes['feature_names']
. After that you can extract the names of the selected features (i.e. the ones with estimated coefficient different from zero) as follows:
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=263)
lasso = Lasso().fit(X_train, y_train)
names = diabetes['feature_names']
print(names)
# ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
print(np.sum(lasso.coef_ != 0))
# 2
print([names[i] for i in range(len(names)) if lasso.coef_[i] != 0])
# ['bmi', 's5']
CodePudding user response:
You can use:
lasso.feature_names_in_
Reference: feature_names_in_
It is a faily new attribute, so please check if your sklearn library is updated. You can do it with:
import sklearn
sklearn.__version__