Home > Software engineering >  pandas resample with origin and closed status
pandas resample with origin and closed status

Time:10-22

I have a dataframe for inventory like the following,

tidx = pd.date_range('2022-10-01', periods=15, freq='D') 
data_frame = pd.DataFrame(1, columns=['inventory'], index=tidx)
data_frame.iloc[-2:] = 0
print(data_frame)

inventory
2022-10-01          1
2022-10-02          1
2022-10-03          1
2022-10-04          1
2022-10-05          1
2022-10-06          1
2022-10-07          1
2022-10-08          1
2022-10-09          1
2022-10-10          1
2022-10-11          1
2022-10-12          1
2022-10-13          1
2022-10-14          0
2022-10-15          0

I want to aggregate for 7 days from any day of the week (here 2020-10-15). If I do the following, I don't know why the results start from 2020-10-13

data_frame.resample("7D", closed = 'right', origin='2020-10-15').sum()

inventory
2022-09-29  6
2022-10-06  7
2022-10-13  0

My desired output is,

inventory
2022-09-01  1
2022-10-08  7
2022-10-15  5

Note: My panda's version is '1.3.5'

CodePudding user response:

data_frame.resample("7D", origin='2022-10-15', closed='right', loffset='7D').sum()

Gives:

            inventory
2022-10-01          1
2022-10-08          7
2022-10-15          5

Although loffset is deprecated. So you can do this instead.

from pandas.tseries.frequencies import to_offset
df = data_frame.resample("7D", origin='2022-10-15', closed='right').sum()
df.index = df.index   to_offset("7D")

Additionally, a more elegant solution to this would be:

data_frame.resample("7D", origin='end', closed='right').sum()

CodePudding user response:

I found a fair solution,

data_frame.resample("W-SAT", closed = 'right').sum()

inventory
2022-10-01  1
2022-10-08  7
2022-10-15  5


The only caveat is to calculate the start date name which is SAT (for 2022-10-15)

  • Related