Home > other >  removing row based on many compound condition
removing row based on many compound condition

Time:09-06

i have a data frame with columns [ID,Date] i want to get the id's that has occurred more than once and check whether they happend in the same date or not. i tried this way for it.

#sample data
   ID        Date
0   1  2020-01-01
1   1  2020-01-01
2   1  2020-02-01
3   2  2020-01-01

#expected output
   ID        Date
   
0   1  2020-02-01
1   2  2020-01-01

this is what i tried

df.drop(df[(df['Date'] >1) & (df['ID'] > 1)])

but it is not giving me correct values.

CodePudding user response:

Use GroupBy.transform for count groups and filter if greater like 1:

print (df)
   ID        Date
0   1  2020-01-01
1   1  2020-01-01
2   2  2020-01-01

df1 = df[df.groupby(['Date','ID'])['ID'].transform('size').gt(1)]
print (df1)
   ID        Date
0   1  2020-01-01
1   1  2020-01-01

Or you can test if duplicated by both columns in DataFrame.duplicated:

df1 = df[df.duplicated(['Date','ID'], keep=False)]

For remove duplicated rows use:

df2 = df[df.groupby(['Date','ID'])['ID'].transform('size').eq(1)]
df2 = df[~df.duplicated(['Date','ID'], keep=False)]
df2 = df.drop_duplicates(['Date','ID'], keep=False)

print (df2)
   ID        Date
2   2  2020-01-01
  • Related