Home > Software design >  drop rows using pandas groupby and filter
drop rows using pandas groupby and filter

Time:05-18

I'm trying to drop rows from a df where certain conditions are met. Using below, I'm grouping values using column C. For each unique group, I want to drop ALL rows where A is less than 1 AND B is greater than 100. This has to occur on the same row though. If I use .any() or .all(), it doesn't return what I want.

df = pd.DataFrame({
    'A' : [1,0,1,0,1,0,0,1,0,1], 
    'B' : [101, 2, 3, 1, 5, 101, 2, 3, 4, 5], 
    'C' : ['d', 'd', 'd', 'd', 'e', 'e', 'e', 'f', 'f',], 
    })

df.groupby(['C']).filter(lambda g: g['A'].lt(1) & g['B'].gt(100))

initial df:

   A    B  C
0  1  101  d # A is not lt 1 so keep all d's
1  0    2  d
2  1    3  d
3  0    1  d
4  1    5  e
5  0  101  e # A is lt 1 and B is gt 100 so drop all e's
6  0    2  e
7  1    3  f
8  0    4  f
9  1    5  f

intended out:

   A    B  C
0  1  101  d
1  0    2  d
2  1    3  d
3  0    1  d
7  1    3  f
8  0    4  f
9  1    5  f

CodePudding user response:

For better performnce get all C values match condition and then filter original column C by Series.isin in boolean indexing with inverted mask:

df1 = df[~df['C'].isin(df.loc[df['A'].lt(1) & df['B'].gt(100), 'C'])]

Another idea is use GroupBy.transform with GroupBy.any for test if match at least one value:

df1 = df[~(df['A'].lt(1) & df['B'].gt(100)).groupby(df['C']).transform('any')]

Your solution is possible with any and not for scalars, if large DataFrame it should be slow:

df1 = df.groupby(['C']).filter(lambda g:not ( g['A'].lt(1) & g['B'].gt(100)).any())

df1 = df.groupby(['C']).filter(lambda g: (g['A'].ge(1) | g['B'].le(100)).all())

print (df1)
   A    B  C
0  1  101  d
1  0    2  d
2  1    3  d
3  0    1  d
7  1    3  f
8  0    4  f
9  1    5  f
  • Related