I have a pandas data frame and my goal is to filter out data at the end of each month (i.e. 2021-12-31, 2022-01-31, etc).
I have 2 sets of code and 1 works but the other doesn't even though they the type(criteria) is the same - string.
Works:
df_monthly = df[df.index == str(dt.date(2022,1,31))]
Does not work:
df_monthly = df[df.index == str(pd.Period(df.index, freq='M').end_time.date())]
The 2nd line returns this error: ValueError: Value must be Period, string, integer, or datetime
Anyone knows why this is the case and how do I remedy? I'm hoping to be able to proceed using pd.Period() instead of dt.date().
Many thanks!
CodePudding user response:
If the index is a DatetimeIndex
, you can call .to_period
followed by .to_timestamp
on the index to do this:
df_monthly = df.loc[df.index == df.index.to_period('M').to_timestamp('M')]
If the index needs to be converted first, you can use pd.to_datetime
(and optionally set errors='coerce'
to handle invalid dates)
Another succinct option is to use offsets.MonthEnd
:
df_monthly = df.loc[df.index == df.index pd.offsets.MonthEnd(0)]