Home > database >  How do I iteratively select rows in pandas based on column values?
How do I iteratively select rows in pandas based on column values?

Time:06-10

I'm a complete newbie at pandas so a simpler (though maybe not the most efficient or elegant) solution is appreciated. I don't mind a bit of brute force if I can understand the answer better.

If I have the following Dataframe:

A    B    C 
0    0    1
0    1    1

I want to loop through columns "A", "B" and "C" in that order and during each iteration select all the rows for which the current column is "1" and none of the previous columns are and save the result and also use it in the next iteration.

So when looking at column A, I wouldn't select anything. Then when looking at column B I would select the second row because B==1 and A==0. Then when looking at column C I would select the first row because A==0 and B==0.

CodePudding user response:

Create a boolean mask:

m = (df == 1) & (df.cumsum(axis=1) == 1)
d = {col: df[m[col]].index.tolist() for col in df.columns}

Output:

>>> m
       A      B      C
0  False  False   True
1  False   True  False
2  False  False   True

>>> d
{'A': [], 'B': [1], 'C': [0, 2]}

I slightly modified your dataframe:

>>> df
   A  B  C
0  0  0  1
1  0  1  1
2  0  0  1

CodePudding user response:

Seems like you need a direct use of idxmax

Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.


>>> df.idxmax()
A    0
B    1
C    0
dtype: int64

The values above are the indexes for which your constraints are met. 1 for B means that the second row was "selected". 0 for C, same. The only issue is that, if nothing is found, it'll also return 0.

To address that, you can use where

>>> df.idxmax().where(~df.eq(0).all())

This will make sure that NaNs are returned for all-zero columns.

A    NaN
B    1.0
C    0.0
dtype: float64
  • Related