Skip to content Skip to sidebar Skip to footer

How Does Sklearn Random Forest Index Feature_importances_

I have used the RandomForestClassifier in sklearn for determining the important features in my dataset. How am I able to return the actual feature names (my variables are labeled x

Solution 1:

Feature Importances returns an array where each index corresponds to the estimated feature importance of that feature in the training set. There is no sorting done internally, it is a 1-to-1 correspondence with the features given to it during training.

If you stored your feature names as a numpy array and made sure it is consistent with the features passed to the model, you can take advantage of numpy indexing to do it.

importances = rf.feature_importances_
important_names = feature_names[importances > np.mean(importances)]
print important_names

Solution 2:

Here's what I use to print and plot feature importance including the names, not just the values

importances = pd.DataFrame({'feature':X_train.columns,'importance':np.round(clf.feature_importances_,3)})
importances = importances.sort_values('importance',ascending=False).set_index('feature')
print importances
importances.plot.bar()

Full example

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
import numpy as np
import pandas as pd

# set vars
predictors = ['x1','x2']
response = 'y'

X = df[predictors]
y = df[response]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

# run model
clf = RandomForestClassifier(max_features=5)
clf.fit(X_train.values, y_train.values)

#show to plot importances
importances = pd.DataFrame({'feature':X_train.columns,'importance':np.round(clf.feature_importances_,3)})
importances = importances.sort_values('importance',ascending=False).set_index('feature')
print importances
importances.plot.bar()

Solution 3:

Get variable explained:

regressor.score(X, y)

Get importance of variable:

importances = regressor.feature_importances_
print(importances)

Post a Comment for "How Does Sklearn Random Forest Index Feature_importances_"