Home > other >  feature importance bagging classifier and column names
feature importance bagging classifier and column names

Time:03-19

I already referred these two posts:

Please don't mark this as a duplicate.

I am trying to get the feature names from a bagging classifier (which does not have inbuilt feature importance).

I have the below sample data and code based on those related posts linked above

import numpy as np
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
clf = BaggingClassifier(DecisionTreeClassifier())
clf.fit(X, y)

feature_importances = np.mean([tree.feature_importances_ for tree in clf.estimators_], axis=0)

but this outputs only the feature importance (shown below) but I also want the feature names.

feature_importances 
# array([0.15098599, 0.27608213, 0.33606019, 0.23687169])

How can I find the corresponding feature names for these feature importance values?

CodePudding user response:

You could call the load_iris function without any parameters, this way the return of the function will be a Bunch object (dictionary-like object) with some attributes. The most relevant, for your use case, would be bunch.data (feature matrix), bunch.target and bunch.feature_names.

...

bunch = load_iris()
X = bunch.data
y = bunch.target
feature_names = bunch.feature_names

clf = BaggingClassifier(DecisionTreeClassifier(), random_state=42)
clf.fit(X, y)

feature_importances = np.mean([tree.feature_importances_ for tree in clf.estimators_], axis=0)

output = {fn:fi for fn,fi in zip(feature_names,feature_importances)}
print(output)
{
    'sepal length (cm)': 0.008652347823679744,
    'sepal width (cm)': 0.01945400672681583,
    'petal length (cm)': 0.539297348817521,
    'petal width (cm)': 0.43259629663198346
}
  • Related