Filter by maxdate and groupby pandas-CodePudding

I would like to use this dataframe

df = pd.DataFrame({'Serial' : ['A1', 'A1', 'A1', 'B1','B1', 'B1'],'Day' : ['01.01.2022', '01.01.2022', '01.01.2021', '01.01.2019', '01.01.2019', '01.01.2020'],'Else' : ['a', 'b', 'c', 'd','e', 'f']})

to groupby Serial and keep only rows with max(Day), ie here is my expected output:

Serial	Day	Else
A1	01.01.2022	a
A1	01.01.2022	b
B1	01.01.2020	f

I success to compute the max but don't know how to use it to filter in order to get the expected output.

df['Day']= pd.to_datetime(df['Day'], format="%d.%m.%Y")
df = df.groupby(['Serial'])['Day'].max()

CodePudding user response：

here is one way to do it

# convert the date to the YMD format for finding max
df['Day2']=pd.to_datetime(df['Day'], dayfirst=True)


# group on Serial, and return the max value against all rows of grouped result
# compare and filter where max date matches the date in DF

out=df.loc[df['Day2'].eq(df.groupby('Serial')['Day2'].transform(max))].drop(columns='Day2')

out

Serial  Day     Else
0   A1  01.01.2022  a
1   A1  01.01.2022  b
5   B1  01.01.2020  f

CodePudding user response：

Based on this answer, you should first get all the index where your date is the maximum. Then you can use your index on your dataframe. Something like that

df = pd.DataFrame({'Serial' : ['A1', 'A1', 'A1', 'B1','B1', 'B1'],'Day' : ['01.01.2022', '01.01.2022', '01.01.2021', '01.01.2019', '01.01.2019', '01.01.2020'],'Else' : ['a', 'b', 'c', 'd','e', 'f']})
df['Day'] = pd.to_datetime(df['Day'], format="%d.%m.%Y")
idx = df.groupby(['Serial'])['Day'].transform(max) == df['Day']
print(df[idx])

Which gives you your result as follow

  Serial        Day Else
0     A1 2022-01-01    a
1     A1 2022-01-01    b
5     B1 2020-01-01    f