I have this dataframe where I have the date and input column, and I want to take only the information of the latest date from each month and year.
The table looks like this
date input
2017-04-06 0.036
2017-04-13 0.055
2017-04-20 0.07
2017-04-27 0.052
2017-05-04 0.039
2017-05-11 0.045
2017-05-18 0.033
2017-05-25 0.025
2017-06-01 0.018
2017-06-08 0.057
2017-06-15 0.05
2017-06-22 0.035
2017-06-29 0.038
the expected dataframe is:
date input
2017-04-27 0.052
2017-05-25 0.025
2017-06-29 0.038
I have tried to use .max()
but I only get the latest date of all date in the column, while I need to take all the columns from the latest date of each month and year.
How can I do this with python?
Thanks in advance
CodePudding user response:
In your case do groupby
with tail
#df.date = pd.to_datetime(df.date)
out = df.groupby(df.date.dt.strftime('%Y-%m')).tail(1)
Out[919]:
date input
3 2017-04-27 0.052
7 2017-05-25 0.025
12 2017-06-29 0.038
CodePudding user response:
Use GroupBy.last
with Series.dt.strftime
:
df = df.groupby(df['date'].dt.strftime('%Y-%m'), as_index=False).last()
print (df)
date input
0 2017-04-27 0.052
1 2017-05-25 0.025
2 2017-06-29 0.038