Home > Software engineering >  How to append dataframe column name in the list?
How to append dataframe column name in the list?

Time:08-17

I am new to pandas. So I am try to append column names in a list whose correlations is greater then zero.

here is my code

corr_matrix = df_train.corr()
corr_matrix["failure"].sort_values(ascending=False)

useful_features = []
for f in corr_matrix["failure"]:
    if f > 0:
        useful_features.append(df_train.columns)
print(useful_features)

But this is appending all column names to the list

[Index(['id', 'product_code', 'loading', 'attribute_0', 'attribute_1',
       'attribute_2', 'attribute_3', 'measurement_0', 'measurement_1',
       'measurement_2', 'measurement_3', 'measurement_4', 'measurement_5',
       'measurement_6', 'measurement_7', 'measurement_8', 'measurement_9',
       'measurement_10', 'measurement_11', 'measurement_12', 'measurement_13',
       'measurement_14', 'measurement_15', 'measurement_16', 'measurement_17',
       'failure', 'kfold'],
.
.
.
I am not pasting complete output

What I want is

useful_features = ['failure','loading',...,'kfold']

Output of corr_matrix["failure"].sort_values(ascending=False)

failure           1.000000
loading           0.129089
measurement_17    0.033905
measurement_5     0.018079
measurement_8     0.017119
measurement_7     0.016787
measurement_2     0.015808
measurement_6     0.014791
measurement_0     0.009646
attribute_2       0.006337
measurement_14    0.006211
measurement_12    0.004398
measurement_3     0.003577
measurement_16    0.002237
kfold             0.000130
measurement_10   -0.001515
measurement_13   -0.001831
measurement_15   -0.003544
measurement_9    -0.003587
measurement_11   -0.004801
id               -0.007545
measurement_4    -0.010488
measurement_1    -0.010810
attribute_3      -0.019222
Name: failure, dtype: float64

Is there any way to append the column names? df_train.columns.values also appends all names in the list

CodePudding user response:

You can use indexing to do this:

print(
    corr_matrix.index[corr_matrix["failure"] > 0]
)

This translates to

  1. Get the index from corr matrix
  2. Evaluate when "failure" column is > 0
  3. Use the above evaluation to filter the index
  • Related