Home > Enterprise >  How to calculate the summation for values based on consecutive days and two other columns
How to calculate the summation for values based on consecutive days and two other columns

Time:12-02

How can I do summation just for consecutive days and for the same name and same supplier? For instance, for A and Supplier Wal, I need to do summation for 2021-05-31 and 2021-06-01 and then do another summation for 2021-06-08 and 2021-06-09. I need to add a new column for summation. Please take a look at the example below:

enter image description here

Here is the Pandas DataFrame code for the table:

df = pd.DataFrame({'Name': ['A', 'A', 'A','A','B','B','C','C','C','C','C','C','C','C','C'],
  'Supplier': ['Wal', 'Wal', 'Wal', 'Wal', 'Co', 'Co', 'Mc', 'Mc', 'St', 'St', 'St', 'St', 'St', 'To', 'To'],
  'Date': ['2021-05-31', '2021-06-01', '2021-06-08', '2021-06-09', '2021-05-17', '2021-05-18'
          , '2021-04-07', '2021-04-08', '2021-05-11', '2021-05-12', '2021-05-13', '2021-05-18'
          , '2021-05-19', '2021-03-30', '2021-03-31'],
  'Amount': [27, 400, 410, 250, 100, 50, 22, 78, 60, 180, 100, 240, 140, 30, 110],
  'Summation': [427,427,660,660,150,150,100,100,340,340,340,380,380,140,140 ]})

CodePudding user response:

Like this?

import pandas as pd

df = pd.DataFrame({'Name': ['A', 'A', 'A','A','B','B','C','C','C','C','C','C','C','C','C'],
  'Supplier': ['Wal', 'Wal', 'Wal', 'Wal', 'Co', 'Co', 'Mc', 'Mc', 'St', 'St', 'St', 'St', 'St', 'To', 'To'],
  'Date': ['2021-05-31', '2021-06-01', '2021-06-08', '2021-06-09', '2021-05-17', '2021-05-18'
          , '2021-04-07', '2021-04-08', '2021-05-11', '2021-05-12', '2021-05-13', '2021-05-18'
          , '2021-05-19', '2021-03-30', '2021-03-31'],
  'Amount': [27, 400, 410, 250, 100, 50, 22, 78, 60, 180, 100, 240, 140, 30, 110]})

df['Date'] = pd.to_datetime(df['Date'])
filt = df.loc[((df['Date'] - df['Date'].shift(-1)).abs() == pd.Timedelta('1d')) | (df['Date'].diff() == pd.Timedelta('1d'))]
breaks = filt['Date'].diff() != pd.Timedelta('1d')
df['Summation'] = df.groupby(['Name','Supplier',breaks.cumsum()])['Amount'].transform('sum')

print(df)

output:

   Name Supplier       Date  Amount  Summation
0     A      Wal 2021-05-31      27        427
1     A      Wal 2021-06-01     400        427
2     A      Wal 2021-06-08     410        660
3     A      Wal 2021-06-09     250        660
4     B       Co 2021-05-17     100        150
5     B       Co 2021-05-18      50        150
6     C       Mc 2021-04-07      22        100
7     C       Mc 2021-04-08      78        100
8     C       St 2021-05-11      60        340
9     C       St 2021-05-12     180        340
10    C       St 2021-05-13     100        340
11    C       St 2021-05-18     240        380
12    C       St 2021-05-19     140        380
13    C       To 2021-03-30      30        140
14    C       To 2021-03-31     110        140
  • Related