I have two datasets that have monthly frequencies. For one of them,df
, I had to aggregate some data to turn it from daily to monthly using the following code:
df_grouped=df.groupby([df.index.to_period('M'),'City ID']).agg({'Estimated Population':'mean','Estimated Population_2019':'mean','Confirmed Rate':['mean','std'],'Death Rate':['mean','std'],'New Confirmed':'sum','New Deaths':'sum'})
df_grouped.index.rename(['Month','City ID'],inplace=True)
After doing these changes my dates became in the format YYYY-MM
, for example:
2020-01
2020-02
...
2021-07
My other dataset, df2
, has the date in format YYYMM
, so I used the following code to convert it:
df2['DATE'] = pd.to_datetime(df2['DATE'],format='%Y%m')
My new dates become in the format YYYY-MM-DD
, where all the DD
become 01, as follows:
2020-01-01
2020-02-01
...
2021-07-01
How can I convert df2
date now from YYYY-MM-DD
to YYYY-MM
?
I was thinking, maybe there is a way to convert straight from YYYYMM
to YYYY-MM
?
CodePudding user response:
You can use
df2['DATE'] = pd.to_datetime(df2['DATE'],format='%Y%m').dt.strftime('%Y-%m')
# or
df['date'] = df['date'].astype(str).str.replace(r'(\d{4})(\d{2})', r'\1-\2', regex=True)
CodePudding user response:
Convert your column to PeriodIndex
:
>>> pd.PeriodIndex(df2['DATE'], freq='M')
PeriodIndex(['2020-09', '2020-10', '2020-11', '2020-12', '2021-01', '2021-02'],
dtype='period[M]', name='DATE')
Input data:
dti = pd.date_range('2020-9', '2021-3', freq='M')
df2 = pd.DataFrame({'DATE': dti.strftime('%Y%m')})
print(df2)
# Output
DATE
0 202009
1 202010
2 202011
3 202012
4 202101
5 202102
CodePudding user response:
Use:
df = pd.DataFrame({'date': ['2020-01-01', '2020-02-01']})
pd.to_datetime(df['date']).dt.to_period('M')
Output: