I already referred these two posts:
Please don't mark this as a duplicate.
I am trying to get the feature names from a bagging classifier (which does not have inbuilt feature importance).
I have the below sample data and code based on those related posts linked above
import numpy as np
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
clf = BaggingClassifier(DecisionTreeClassifier())
clf.fit(X, y)
feature_importances = np.mean([tree.feature_importances_ for tree in clf.estimators_], axis=0)
but this outputs only the feature importance (shown below) but I also want the feature names.
feature_importances
# array([0.15098599, 0.27608213, 0.33606019, 0.23687169])
How can I find the corresponding feature names for these feature importance values?
CodePudding user response:
You could call the load_iris
function without any parameters, this way the return of the function will be a Bunch
object (dictionary-like object) with some attributes. The most relevant, for your use case, would be bunch.data
(feature matrix), bunch.target
and bunch.feature_names
.
...
bunch = load_iris()
X = bunch.data
y = bunch.target
feature_names = bunch.feature_names
clf = BaggingClassifier(DecisionTreeClassifier(), random_state=42)
clf.fit(X, y)
feature_importances = np.mean([tree.feature_importances_ for tree in clf.estimators_], axis=0)
output = {fn:fi for fn,fi in zip(feature_names,feature_importances)}
print(output)
{
'sepal length (cm)': 0.008652347823679744,
'sepal width (cm)': 0.01945400672681583,
'petal length (cm)': 0.539297348817521,
'petal width (cm)': 0.43259629663198346
}