Expand (unbalanced) rows between two dates-CodePudding

I have a dataframe with monthly data. A sample is below

df = pd.DataFrame({'date': ['2010-02-28', '2010-04-30', '2010-01-31', '2010-03-31'], 'group': ['A', 'A', 'B', 'B']})

I want to insert the month end dates between the starting date and the ending date for each group. The problem is each group has different starting and ending dates.

Below is what I tried. However, the result has the same starting date and ending date and the new_date-group combination is not unique.

df['new_date'] = df.apply(lambda x: pd.date_range(start='2010-01-31', end='2010-04-30', freq = 'M'), axis=1)
df = df.explode('new_date').reset_index(drop=True)

CodePudding user response：

You can use pd.date_range in a groupby_apply:

>>> (df.groupby('group')['date']
       .apply(lambda x: pd.Series(pd.date_range(x.iloc[0], x.iloc[1], freq='M')))
       .droplevel(1).reset_index())

  group       date
0     A 2010-02-28
1     A 2010-03-31
2     A 2010-04-30
3     B 2010-01-31
4     B 2010-02-28
5     B 2010-03-31

CodePudding user response：

Another possible solution:

df['date'] = pd.to_datetime(df['date'])

(df.set_index('date')
 .groupby('group', as_index=False)
 .resample('M').asfreq().reset_index()
 .ffill().drop('level_0', axis=1))

Output:

        date group
0 2010-02-28     A
1 2010-03-31     A
2 2010-04-30     A
3 2010-01-31     B
4 2010-02-28     B
5 2010-03-31     B