I have a dataframe with several columns and rows all values of each column are numbers. I want to see which cells satisfy a condition and then see their column, key and value.
eg.
a b
x 1 3
y 2 2
z 3 1
if the condition is x > 2, I want to return something like:
[('a', 'z', 3), ('b', 'x', 3)]
It doesn't really matter the exact return format. but I want to be able to get this information in some way
CodePudding user response:
You can stack
, giving you a MultiIndex where the first level is the index, and the second level is the columns, then filter this single Series. Here I rename
the axes so the resulting Series is a bit more descriptive.
s = df.rename_axis(index='index', columns='col').stack().loc[lambda x: x>2]
#index col
#x b 3
#z a 3
#dtype: int64
If you want the columns to be first and index to be second, then after the stack
you can chain on a .swaplevel(0,1)
.
If you want to get some other container, the tuples are a bit of a pain, but you can get an array pretty easily.
s.reset_index().to_numpy()
#array([['x', 'b', 3],
# ['z', 'a', 3]], dtype=object)