i have a data frame with columns [ID,Date] i want to get the id's that has occurred more than once and check whether they happend in the same date or not. i tried this way for it.
#sample data
ID Date
0 1 2020-01-01
1 1 2020-01-01
2 1 2020-02-01
3 2 2020-01-01
#expected output
ID Date
0 1 2020-02-01
1 2 2020-01-01
this is what i tried
df.drop(df[(df['Date'] >1) & (df['ID'] > 1)])
but it is not giving me correct values.
CodePudding user response:
Use GroupBy.transform
for count groups and filter if greater like 1
:
print (df)
ID Date
0 1 2020-01-01
1 1 2020-01-01
2 2 2020-01-01
df1 = df[df.groupby(['Date','ID'])['ID'].transform('size').gt(1)]
print (df1)
ID Date
0 1 2020-01-01
1 1 2020-01-01
Or you can test if duplicated by both columns in DataFrame.duplicated
:
df1 = df[df.duplicated(['Date','ID'], keep=False)]
For remove duplicated rows use:
df2 = df[df.groupby(['Date','ID'])['ID'].transform('size').eq(1)]
df2 = df[~df.duplicated(['Date','ID'], keep=False)]
df2 = df.drop_duplicates(['Date','ID'], keep=False)
print (df2)
ID Date
2 2 2020-01-01