I have a dataframe that looks something like this
A B C D ........
AB 2 1 3 4 ........
BC 1 3 5 4 ........
CD 3 2 6 3 ........
DE 2 1 2 2 ........
I want to extract all the column names where the value is greater than 2 for a specified row and I want these column names returned in a list. So for example if I want this for row "AB", then the result should be like
['C', 'D', (all further column names with value greater than 2)]
and if I want this for row "BC", then the result should be like
['B', 'C', 'D', (all further column names with value greater than 2)]
I tried for over an hour looking online for something that can help me with this, but couldn't find anything. Could someone please help me with this? I will very much appreciate any help
CodePudding user response:
Use DataFrame.loc
for select row by index value and filter columns names by condition for greater like 2
by Series.gt
:
out = df.columns[df.loc['AB'].gt(2)].tolist()
print (out)
['C', 'D']
If need processing all rows use DataFrame.apply
:
out1 = df.gt(2).apply(lambda x: x.index[x].tolist(), axis=1)
print (out1)
AB [C, D]
BC [B, C, D]
CD [A, C, D]
DE []
dtype: object