How Does Sklearn Random Forest Index Feature_importances_
I have used the RandomForestClassifier in sklearn for determining the important features in my dataset. How am I able to return the actual feature names (my variables are labeled x
Solution 1:
Feature Importances returns an array where each index corresponds to the estimated feature importance of that feature in the training set. There is no sorting done internally, it is a 1-to-1 correspondence with the features given to it during training.
If you stored your feature names as a numpy array and made sure it is consistent with the features passed to the model, you can take advantage of numpy indexing to do it.
importances = rf.feature_importances_
important_names = feature_names[importances > np.mean(importances)]
print important_names
Solution 2:
Here's what I use to print and plot feature importance including the names, not just the values
importances = pd.DataFrame({'feature':X_train.columns,'importance':np.round(clf.feature_importances_,3)})
importances = importances.sort_values('importance',ascending=False).set_index('feature')
print importances
importances.plot.bar()
Full example
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
import numpy as np
import pandas as pd
# set vars
predictors = ['x1','x2']
response = 'y'
X = df[predictors]
y = df[response]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
# run model
clf = RandomForestClassifier(max_features=5)
clf.fit(X_train.values, y_train.values)
#show to plot importances
importances = pd.DataFrame({'feature':X_train.columns,'importance':np.round(clf.feature_importances_,3)})
importances = importances.sort_values('importance',ascending=False).set_index('feature')
print importances
importances.plot.bar()
Solution 3:
Get variable explained:
regressor.score(X, y)
Get importance of variable:
importances = regressor.feature_importances_
print(importances)
Post a Comment for "How Does Sklearn Random Forest Index Feature_importances_"