Home > database >  How to get the column name of a dataframe from values in a numpy array
How to get the column name of a dataframe from values in a numpy array

Time:03-24

I have a df with 15 columns: df.columns:

    0  class
    1  name
    2  location
    3  income
    4  edu_level
    --
    14 marital_status

after some transformations I got an numpy.ndarray with shape (15,3) named loads:

0.52   0.33   0.09
0.20   0.53   0.23
0.60   0.28   0.23
0.13   0.45   0.41
0.49   0.9
so on  so on  so on

So, 3 columns with 15 values.

What I need to do:

I want to get the df column name of the values from the first column of loads that are greater then .50

For this example, the columns of df related to the first column of loadswith values higher than 0.5 should return:

0 Class
2 Location

Same for the second column of loads, should return:

1 name
3 income
4 edu_level

and the same logic to the 3rd column of loads.

I managed to get the numparray loads they way I need it but I am having a bad time with this last part. I know I can simple manually pick the columns but this will be a hard task when df has more than 15 features.

Can anyone help me, please?

CodePudding user response:

given your threshold you can create a boolean array in order to filter df.columns:

threshold = .5

for j in range(loads.shape[1]):
      print(df.columms[loads[:,j]>threshold])
  • Related