Home > Net >  Pandas expand date range with multiple times and forward filling
Pandas expand date range with multiple times and forward filling

Time:09-20

I have a dataframe like this:

DATE       MIN_AMOUNT MAX_AMOUNT MIN_DAY MAX_DAY 
01/09/2022 10         20         1       2 
01/09/2022 15         25         4       5 
01/09/2022 30         50         7       10 
05/09/2022 10         20         1       2 
05/09/2022 15         25         4       5   
07/09/2022 15         25         4       5

I want to expand the dataframe to all date range between the DATE column with forward filling. The desired putput is:

DATE       MIN_AMOUNT MAX_AMOUNT MIN_DAY MAX_DAY 
01/09/2022 10         20         1       2 
01/09/2022 15         25         4       5 
01/09/2022 30         50         7       10 
02/09/2022 10         20         1       2 
02/09/2022 15         25         4       5 
02/09/2022 30         50         7       10 
03/09/2022 10         20         1       2 
03/09/2022 15         25         4       5 
03/09/2022 30         50         7       10 
04/09/2022 10         20         1       2 
04/09/2022 15         25         4       5 
04/09/2022 30         50         7       10 
05/09/2022 10         20         1       2 
05/09/2022 15         25         4       5 
06/09/2022 10         20         1       2 
06/09/2022 15         25         4       5 
07/09/2022 15         25         4       5

Could you please help me about this?

CodePudding user response:

First convert values to datetimes, create helper counter Series g by GroupBy.cumcount for reshape by DataFrame.set_index and DataFrame.unstack, then use DataFrame.asfreq with method='ffill' and reshape back by DataFrame.stack, remove helper level by DataFrame.droplevel, convert DatetimeIndex to column, change format of datetimes and last create same dtypes like original DataFrame:

df['DATE'] = pd.to_datetime(df['DATE'], dayfirst=True)

g = df.groupby('DATE').cumcount()

df = (df.set_index(['DATE',g])
        .unstack()
        .asfreq('D', method='ffill')
        .stack()
        .droplevel(-1)
        .reset_index()
        .assign(DATE = lambda x: x['DATE'].dt.strftime('%d/%m/%Y'))
        .astype(df.dtypes)
)
print (df)
         DATE  MIN_AMOUNT  MAX_AMOUNT  MIN_DAY  MAX_DAY
0  2022-01-09          10          20        1        2
1  2022-01-09          15          25        4        5
2  2022-01-09          30          50        7       10
3  2022-02-09          10          20        1        2
4  2022-02-09          15          25        4        5
5  2022-02-09          30          50        7       10
6  2022-03-09          10          20        1        2
7  2022-03-09          15          25        4        5
8  2022-03-09          30          50        7       10
9  2022-04-09          10          20        1        2
10 2022-04-09          15          25        4        5
11 2022-04-09          30          50        7       10
12 2022-05-09          10          20        1        2
13 2022-05-09          15          25        4        5
14 2022-06-09          10          20        1        2
15 2022-06-09          15          25        4        5
16 2022-07-09          15          25        4        5

CodePudding user response:

A couple of merges should help with this, and should still be efficient as the data size increases:

Get the unique dates and build a new dataframe from that:

out = df.DATE.drop_duplicates()
dates = pd.date_range(out.min(), out.max(), freq='D')
dates = pd.DataFrame(dates, columns=['dates'])

Merge dates with out, and subsequently merge the outcome with the original dataframe:

(dates
.merge(
    out, 
    left_on='dates',
    right_on='DATE', 
    how = 'left')
# faster to fill on a Series than a Dataframe
.assign(DATE = lambda df: df.DATE.ffill())
.merge(
    df,
    on = 'DATE',
    how = 'left')
.drop(columns='DATE')
.rename(columns= {'dates':'DATE'})
)

         DATE  MIN_AMOUNT  MAX_AMOUNT  MIN_DAY  MAX_DAY
0  2022-09-01          10          20        1        2
1  2022-09-01          15          25        4        5
2  2022-09-01          30          50        7       10
3  2022-09-02          10          20        1        2
4  2022-09-02          15          25        4        5
5  2022-09-02          30          50        7       10
6  2022-09-03          10          20        1        2
7  2022-09-03          15          25        4        5
8  2022-09-03          30          50        7       10
9  2022-09-04          10          20        1        2
10 2022-09-04          15          25        4        5
11 2022-09-04          30          50        7       10
12 2022-09-05          10          20        1        2
13 2022-09-05          15          25        4        5
14 2022-09-06          10          20        1        2
15 2022-09-06          15          25        4        5
16 2022-09-07          15          25        4        5
  • Related