Home > database >  How to sum amount for all rows if the current rows close date falls between the other rows close and
How to sum amount for all rows if the current rows close date falls between the other rows close and

Time:11-20

I have a dataframe with "close_date", "open_date", "amount", "sales_rep".

sales_rep open_date(MM/DD/YYYY) close_date amount
Jim 1/01/2021 2/05/2021 3
Jim 1/15/2021 4/06/2021 26
Jim 2/01/2021 2/06/2021 7
Jim 2/15/2021 3/14/2021 12
Jim 3/01/2021 4/22/2021 13
Jim 3/15/2021 3/29/2021 5
Jim 4/01/2021 4/20/2021 17
Bob 1/01/2021 1/12/2021 23
Bob 1/15/2021 2/16/2021 12
Bob 2/01/2021 3/04/2021 4
Bob 2/15/2021 4/05/2021 23
Bob 3/01/2021 3/24/2021 12
Bob 3/15/2021 4/15/2021 7
Bob 4/01/2021 5/01/2021 20

I want to create a column that tells me the open amount. So if we take the second row we can see that the opp was closed on 04/06/2021. I want to know how many open opps there were before that date. So I would look to see if the open date for row 5 was before the close date of 4/06/2021 and that the close date for row 5 is also after 04/06/2021. In this case it is so I would add that to the sum. I also want to current row value to be included in the sum. This should be done for each sales rep in the dataframe. I have filled in the table with the expected values below.

sales_rep open_date(MM/DD/YYYY) close_date amount open_amount_sum
Jim 1/01/2021 2/05/2021 3 36 (I got this by adding 3, 26, and 7 because those are the only two values that fit the condition and the 3 because it is the value for that row.)
Jim 1/15/2021 4/06/2021 26 56
Jim 2/01/2021 2/06/2021 7 33
Jim 2/15/2021 3/14/2021 12 51
Jim 3/01/2021 4/22/2021 13 13
Jim 3/15/2021 3/29/2021 5 44
Jim 4/01/2021 4/20/2021 17 30
Bob 1/01/2021 1/12/2021 23 23
Bob 1/15/2021 2/16/2021 12 39
Bob 2/01/2021 3/04/2021 4 39
Bob 2/15/2021 4/05/2021 23 50
Bob 3/01/2021 3/24/2021 12 42
Bob 3/15/2021 4/15/2021 7 27
Bob 4/01/2021 5/01/2021 20 20

CodePudding user response:

Edit @RJ's solution from the comments is better. here it is formatted slightly differently

df['open_amount_sum'] = df.apply(
    lambda x: df[
        df['sales_rep'].eq(x['sales_rep']) &
        df['open_date'].le(x['close_date']) &
        df['close_date'].ge(x['close_date'])
    ]['amount'].sum(),
    axis=1,
)

Here is a solution, but it is slow and kind of ugly. can definitely be improved

import pandas as pd
import io

df = pd.read_csv(io.StringIO(
"""
sales_rep,open_date,close_date,amount
Jim,1/01/2021,2/05/2021,3
Jim,1/15/2021,4/06/2021,26
Jim,2/01/2021,2/06/2021,7
Jim,2/15/2021,3/14/2021,12
Jim,3/01/2021,4/22/2021,13
Jim,3/15/2021,3/29/2021,5
Jim,4/01/2021,4/20/2021,17
Bob,1/01/2021,1/12/2021,23
Bob,1/15/2021,2/16/2021,12
Bob,2/01/2021,3/04/2021,4
Bob,2/15/2021,4/05/2021,23
Bob,3/01/2021,3/24/2021,12
Bob,3/15/2021,4/15/2021,7
Bob,4/01/2021,5/01/2021,20
"""
))


sum_df = df.groupby('sales_rep').apply(
    lambda g:
        g['close_date'].apply(
            lambda close:
                g.loc[
                    g['open_date'].le(close) & g['close_date'].ge(close),
                    'amount'
                ].sum())
    
).reset_index(level=0)

df['close_sum'] = sum_df['close_date']
df

CodePudding user response:

Merge the dataframe unto itself, then filter, before grouping:

(df
 .merge(df, on='sales_rep')
 .query('open_date_y <= close_date_x<=close_date_y')
 .loc(axis=1)['sales_rep', 'open_date_x', 'close_date_x', 'amount_x', 'amount_y']
 .rename(columns=lambda col: col.removesuffix('_x'))
 .rename(columns = {'amount_y' : 'open_sum_amount'})
 .groupby(['sales_rep', 'open_date', 'close_date', 'amount'], 
          sort = False, 
          as_index = False)
 .sum()
 )
   sales_rep  open_date close_date  amount  open_sum_amount
0        Jim 2021-01-01 2021-02-05       3               36
1        Jim 2021-01-15 2021-04-06      26               56
2        Jim 2021-02-01 2021-02-06       7               33
3        Jim 2021-02-15 2021-03-14      12               51
4        Jim 2021-03-01 2021-04-22      13               13
5        Jim 2021-03-15 2021-03-29       5               44
6        Jim 2021-04-01 2021-04-20      17               30
7        Bob 2021-01-01 2021-01-12      23               23
8        Bob 2021-01-15 2021-02-16      12               39
9        Bob 2021-02-01 2021-03-04       4               39
10       Bob 2021-02-15 2021-04-05      23               50
11       Bob 2021-03-01 2021-03-24      12               42
12       Bob 2021-03-15 2021-04-15       7               27
13       Bob 2021-04-01 2021-05-01      20               20
  • Related