Home > Back-end >  Cumulative metric in continuous date range based on non-continuous date changes
Cumulative metric in continuous date range based on non-continuous date changes

Time:10-02

On specific dates, a metric starting at 0 increases by a value. Given a set of non-continuous dates and values, is it possible to produce a column with metric?

Input - metric changes per day

date                value
02-03-2022 00:00:00 10
03-03-2022 00:00:00 0
06-03-2022 00:00:00 2
10-03-2022 00:00:00 18

Output - metric calculated for continuous range of days (starting value = 0 unless change applies already on first day)

    0           metric
0   2022-02-28  0
1   2022-03-01  0
2   2022-03-02  10
3   2022-03-03  10
4   2022-03-04  10
5   2022-03-05  10
6   2022-03-06  12
7   2022-03-07  12
8   2022-03-08  12
9   2022-03-09  12
10  2022-03-10  30
11  2022-03-11  30
12  2022-03-12  30
13  2022-03-13  30

Code example

import pandas as pd

df = pd.DataFrame({'date': ['02-03-2022 00:00:00',
                            '03-03-2022 00:00:00',
                            '06-03-2022 00:00:00',
                            '10-03-2022 00:00:00'],
                   'value': [10, 0, 2, 18]},
                   index=[0,1,2,3])

df2 = pd.DataFrame(pd.date_range(start='28-02-2022', end='13-03-2022'))
df2['metric'] = 0  # TODO

CodePudding user response:

Replace values in df2 from df by date, fill missing values with 0 and then cumsum:

df['date'] = pd.to_datetime(df.date, format='%d-%m-%Y %H:%M:%S')
df2['metric'] = df2[0].map(df.set_index('date')['value']).fillna(0).cumsum()
df2

            0  metric
0  2022-02-28     0.0
1  2022-03-01     0.0
2  2022-03-02    10.0
3  2022-03-03    10.0
4  2022-03-04    10.0
5  2022-03-05    10.0
6  2022-03-06    12.0
7  2022-03-07    12.0
8  2022-03-08    12.0
9  2022-03-09    12.0
10 2022-03-10    30.0
11 2022-03-11    30.0
12 2022-03-12    30.0
13 2022-03-13    30.0

CodePudding user response:

df.reindex is useful for this. Then add df.fillna and apply df.cumsum.

import pandas as pd

df = pd.DataFrame({'date': ['02-03-2022 00:00:00',
                            '03-03-2022 00:00:00',
                            '06-03-2022 00:00:00',
                            '10-03-2022 00:00:00'],
                   'value': [10, 0, 2, 18]},
                   index=[0,1,2,3])

df['date'] = pd.to_datetime(df.date, format='%d-%m-%Y %H:%M:%S')

res = df.set_index('date').reindex(pd.date_range(
    start='2022-02-28', end='2022-03-13')).fillna(0).cumsum()\
    .reset_index(drop=False).rename(columns={'index':'date',
                                             'value':'metric'})
    
print(res)

         date  metric
0  2022-02-28     0.0
1  2022-03-01     0.0
2  2022-03-02    10.0
3  2022-03-03    10.0
4  2022-03-04    10.0
5  2022-03-05    10.0
6  2022-03-06    12.0
7  2022-03-07    12.0
8  2022-03-08    12.0
9  2022-03-09    12.0
10 2022-03-10    30.0
11 2022-03-11    30.0
12 2022-03-12    30.0
13 2022-03-13    30.0
  • Related