Home > Software engineering >  Pandas delete all duplicate rows in one column if values in another column is higher than a threshol
Pandas delete all duplicate rows in one column if values in another column is higher than a threshol

Time:11-12

I have a dataframe where there are duplicate values in column A that have different values in column B.

I want to delete rows if one of column A duplicated values has values higher than 15 in column B.

Original Datafram

A Column B Column
1 10
1 14
2 10
2 20
3 5
3 10

Desired dataframe

A Column B Column
1 10
1 14
3 5
3 10

CodePudding user response:

This works:

dfnew = df.groupby('A Column').filter(lambda x: x['B Column'].max()<=15 )
dfnew.reset_index(drop=True, inplace=True) 
dfnew = dfnew[['A Column','B Column']] 
print(dfnew)

output:

   A Column  B Column
0         1        10
1         1        14
2         3         5
3         3        10

CodePudding user response:

Here is another way using groupby() and transform()

df.loc[~df['B Column'].gt(15).groupby(df['A Column']).transform('any')]
  • Related