Finding the features used in a lasso model-CodePudding

I am using the diabetes dataset from sklearn.

from sklearn.datasets import load_diabetes 
from sklearn.model_selection import train_test_split
 
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=263)

from sklearn.linear_model import Lasso

lasso = Lasso().fit(X_train, y_train)
 
import numpy as np 

np.sum(lasso.coef_ != 0)

I split the dataset then train my Lasso model using the training datasets. My last print statement returns how many features the model uses. How can i define the names of these features in sklearn/ Python?

CodePudding user response：

You can get the feature names of the diabetes dataset using diabetes['feature_names']. After that you can extract the names of the selected features (i.e. the ones with estimated coefficient different from zero) as follows:

import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split

diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=263)
lasso = Lasso().fit(X_train, y_train)

names = diabetes['feature_names']
print(names)
# ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

print(np.sum(lasso.coef_ != 0))
# 2

print([names[i] for i in range(len(names)) if lasso.coef_[i] != 0])
# ['bmi', 's5']

CodePudding user response：

You can use:

lasso.feature_names_in_

Reference: feature_names_in_

It is a faily new attribute, so please check if your sklearn library is updated. You can do it with:

import sklearn
sklearn.__version__