Home > Software engineering >  Filter by maxdate and groupby pandas
Filter by maxdate and groupby pandas

Time:11-18

I would like to use this dataframe

df = pd.DataFrame({'Serial' : ['A1', 'A1', 'A1', 'B1','B1', 'B1'],'Day' : ['01.01.2022', '01.01.2022', '01.01.2021', '01.01.2019', '01.01.2019', '01.01.2020'],'Else' : ['a', 'b', 'c', 'd','e', 'f']})

to groupby Serial and keep only rows with max(Day), ie here is my expected output:

Serial Day Else
A1 01.01.2022 a
A1 01.01.2022 b
B1 01.01.2020 f

I success to compute the max but don't know how to use it to filter in order to get the expected output.

df['Day']= pd.to_datetime(df['Day'], format="%d.%m.%Y")
df = df.groupby(['Serial'])['Day'].max()

CodePudding user response:

here is one way to do it

# convert the date to the YMD format for finding max
df['Day2']=pd.to_datetime(df['Day'], dayfirst=True)


# group on Serial, and return the max value against all rows of grouped result
# compare and filter where max date matches the date in DF

out=df.loc[df['Day2'].eq(df.groupby('Serial')['Day2'].transform(max))].drop(columns='Day2')

out
Serial  Day     Else
0   A1  01.01.2022  a
1   A1  01.01.2022  b
5   B1  01.01.2020  f

CodePudding user response:

Based on this answer, you should first get all the index where your date is the maximum. Then you can use your index on your dataframe. Something like that

df = pd.DataFrame({'Serial' : ['A1', 'A1', 'A1', 'B1','B1', 'B1'],'Day' : ['01.01.2022', '01.01.2022', '01.01.2021', '01.01.2019', '01.01.2019', '01.01.2020'],'Else' : ['a', 'b', 'c', 'd','e', 'f']})
df['Day'] = pd.to_datetime(df['Day'], format="%d.%m.%Y")
idx = df.groupby(['Serial'])['Day'].transform(max) == df['Day']
print(df[idx])

Which gives you your result as follow

  Serial        Day Else
0     A1 2022-01-01    a
1     A1 2022-01-01    b
5     B1 2020-01-01    f
  • Related