Home > Back-end >  Pandas group by rolling window function on a timestamp field
Pandas group by rolling window function on a timestamp field

Time:12-04

I want to add dates and days that are contained in a column after grouping by an ID column.

The following generates an example df:

df = pd.DataFrame(
    {
        "ID":[1,1,1,1,2,2,2,3,3,3,3,3,3],
        "Date":list(pd.date_range("2018-1-1", "2018-4-10", periods=4))   list(pd.date_range("2018-6-6", "2018-7-30", periods=3))   list(pd.date_range("2018-1-1", "2020-1-1", periods=6))
    }
)
df['date_intervals'] = df.groupby('ID').Date.diff()
df['new_date_intermediate'] = df.date_intervals.mask(pd.isnull, df['Date'])

This results in this df: enter image description here

Grouped by the ID field, I want a cumulative sum returning dates.

For example, for ID = 1, I want a vector of the first row the second row, this would be 2018-01-01 33 days, followed by the result of that sum plus the third row which is adding another 33 days.

CodePudding user response:

You can just do cumsum

df['new_date_intermediate'] = df.groupby('ID')['new_date_intermediate'].apply(lambda x :x.cumsum())
df
    ID       Date date_intervals new_date_intermediate
0    1 2018-01-01            NaT   2018-01-01 00:00:00
1    1 2018-02-03        33 days   2018-02-03 00:00:00
2    1 2018-03-08        33 days   2018-03-08 00:00:00
3    1 2018-04-10        33 days   2018-04-10 00:00:00
4    2 2018-06-06            NaT   2018-06-06 00:00:00
5    2 2018-07-03        27 days   2018-07-03 00:00:00
6    2 2018-07-30        27 days   2018-07-30 00:00:00
7    3 2018-01-01            NaT   2018-01-01 00:00:00
8    3 2018-05-27       146 days   2018-05-27 00:00:00
9    3 2018-10-20       146 days   2018-10-20 00:00:00
10   3 2019-03-15       146 days   2019-03-15 00:00:00
11   3 2019-08-08       146 days   2019-08-08 00:00:00
12   3 2020-01-01       146 days   2020-01-01 00:00:00
  • Related