Home > database >  pandas extracting column names with values greater than a threshold in a list
pandas extracting column names with values greater than a threshold in a list

Time:06-09

I have a dataframe that looks something like this

             A         B        C       D ........
   AB        2         1        3       4 ........
   BC        1         3        5       4 ........
   CD        3         2        6       3 ........
   DE        2         1        2       2 ........

I want to extract all the column names where the value is greater than 2 for a specified row and I want these column names returned in a list. So for example if I want this for row "AB", then the result should be like

  ['C', 'D', (all further column names with value greater than 2)]

and if I want this for row "BC", then the result should be like

  ['B', 'C', 'D', (all further column names with value greater than 2)]

I tried for over an hour looking online for something that can help me with this, but couldn't find anything. Could someone please help me with this? I will very much appreciate any help

CodePudding user response:

Use DataFrame.loc for select row by index value and filter columns names by condition for greater like 2 by Series.gt:

out = df.columns[df.loc['AB'].gt(2)].tolist()
print (out)
['C', 'D']

If need processing all rows use DataFrame.apply:

out1 = df.gt(2).apply(lambda x: x.index[x].tolist(), axis=1)
print (out1)
AB       [C, D]
BC    [B, C, D]
CD    [A, C, D]
DE           []
dtype: object
  • Related