I have a dataframe with monthly data. A sample is below
df = pd.DataFrame({'date': ['2010-02-28', '2010-04-30', '2010-01-31', '2010-03-31'], 'group': ['A', 'A', 'B', 'B']})
I want to insert the month end dates between the starting date and the ending date for each group. The problem is each group has different starting and ending dates.
Below is what I tried. However, the result has the same starting date and ending date and the new_date-group combination is not unique.
df['new_date'] = df.apply(lambda x: pd.date_range(start='2010-01-31', end='2010-04-30', freq = 'M'), axis=1)
df = df.explode('new_date').reset_index(drop=True)
CodePudding user response:
You can use pd.date_range
in a groupby_apply
:
>>> (df.groupby('group')['date']
.apply(lambda x: pd.Series(pd.date_range(x.iloc[0], x.iloc[1], freq='M')))
.droplevel(1).reset_index())
group date
0 A 2010-02-28
1 A 2010-03-31
2 A 2010-04-30
3 B 2010-01-31
4 B 2010-02-28
5 B 2010-03-31
CodePudding user response:
Another possible solution:
df['date'] = pd.to_datetime(df['date'])
(df.set_index('date')
.groupby('group', as_index=False)
.resample('M').asfreq().reset_index()
.ffill().drop('level_0', axis=1))
Output:
date group
0 2010-02-28 A
1 2010-03-31 A
2 2010-04-30 A
3 2010-01-31 B
4 2010-02-28 B
5 2010-03-31 B