How to sum amount for all rows if the current rows close date falls between the other rows close and-CodePudding

I have a dataframe with "close_date", "open_date", "amount", "sales_rep".

sales_rep	open_date(MM/DD/YYYY)	close_date	amount
Jim	1/01/2021	2/05/2021	3
Jim	1/15/2021	4/06/2021	26
Jim	2/01/2021	2/06/2021	7
Jim	2/15/2021	3/14/2021	12
Jim	3/01/2021	4/22/2021	13
Jim	3/15/2021	3/29/2021	5
Jim	4/01/2021	4/20/2021	17
Bob	1/01/2021	1/12/2021	23
Bob	1/15/2021	2/16/2021	12
Bob	2/01/2021	3/04/2021	4
Bob	2/15/2021	4/05/2021	23
Bob	3/01/2021	3/24/2021	12
Bob	3/15/2021	4/15/2021	7
Bob	4/01/2021	5/01/2021	20

I want to create a column that tells me the open amount. So if we take the second row we can see that the opp was closed on 04/06/2021. I want to know how many open opps there were before that date. So I would look to see if the open date for row 5 was before the close date of 4/06/2021 and that the close date for row 5 is also after 04/06/2021. In this case it is so I would add that to the sum. I also want to current row value to be included in the sum. This should be done for each sales rep in the dataframe. I have filled in the table with the expected values below.

sales_rep	open_date(MM/DD/YYYY)	close_date	amount	open_amount_sum
Jim	1/01/2021	2/05/2021	3	36 (I got this by adding 3, 26, and 7 because those are the only two values that fit the condition and the 3 because it is the value for that row.)
Jim	1/15/2021	4/06/2021	26	56
Jim	2/01/2021	2/06/2021	7	33
Jim	2/15/2021	3/14/2021	12	51
Jim	3/01/2021	4/22/2021	13	13
Jim	3/15/2021	3/29/2021	5	44
Jim	4/01/2021	4/20/2021	17	30
Bob	1/01/2021	1/12/2021	23	23
Bob	1/15/2021	2/16/2021	12	39
Bob	2/01/2021	3/04/2021	4	39
Bob	2/15/2021	4/05/2021	23	50
Bob	3/01/2021	3/24/2021	12	42
Bob	3/15/2021	4/15/2021	7	27
Bob	4/01/2021	5/01/2021	20	20

CodePudding user response：

Edit @RJ's solution from the comments is better. here it is formatted slightly differently

df['open_amount_sum'] = df.apply(
    lambda x: df[
        df['sales_rep'].eq(x['sales_rep']) &
        df['open_date'].le(x['close_date']) &
        df['close_date'].ge(x['close_date'])
    ]['amount'].sum(),
    axis=1,
)

Here is a solution, but it is slow and kind of ugly. can definitely be improved

import pandas as pd
import io

df = pd.read_csv(io.StringIO(
"""
sales_rep,open_date,close_date,amount
Jim,1/01/2021,2/05/2021,3
Jim,1/15/2021,4/06/2021,26
Jim,2/01/2021,2/06/2021,7
Jim,2/15/2021,3/14/2021,12
Jim,3/01/2021,4/22/2021,13
Jim,3/15/2021,3/29/2021,5
Jim,4/01/2021,4/20/2021,17
Bob,1/01/2021,1/12/2021,23
Bob,1/15/2021,2/16/2021,12
Bob,2/01/2021,3/04/2021,4
Bob,2/15/2021,4/05/2021,23
Bob,3/01/2021,3/24/2021,12
Bob,3/15/2021,4/15/2021,7
Bob,4/01/2021,5/01/2021,20
"""
))


sum_df = df.groupby('sales_rep').apply(
    lambda g:
        g['close_date'].apply(
            lambda close:
                g.loc[
                    g['open_date'].le(close) & g['close_date'].ge(close),
                    'amount'
                ].sum())
    
).reset_index(level=0)

df['close_sum'] = sum_df['close_date']
df

CodePudding user response：

Merge the dataframe unto itself, then filter, before grouping:

(df
 .merge(df, on='sales_rep')
 .query('open_date_y <= close_date_x<=close_date_y')
 .loc(axis=1)['sales_rep', 'open_date_x', 'close_date_x', 'amount_x', 'amount_y']
 .rename(columns=lambda col: col.removesuffix('_x'))
 .rename(columns = {'amount_y' : 'open_sum_amount'})
 .groupby(['sales_rep', 'open_date', 'close_date', 'amount'], 
          sort = False, 
          as_index = False)
 .sum()
 )
   sales_rep  open_date close_date  amount  open_sum_amount
0        Jim 2021-01-01 2021-02-05       3               36
1        Jim 2021-01-15 2021-04-06      26               56
2        Jim 2021-02-01 2021-02-06       7               33
3        Jim 2021-02-15 2021-03-14      12               51
4        Jim 2021-03-01 2021-04-22      13               13
5        Jim 2021-03-15 2021-03-29       5               44
6        Jim 2021-04-01 2021-04-20      17               30
7        Bob 2021-01-01 2021-01-12      23               23
8        Bob 2021-01-15 2021-02-16      12               39
9        Bob 2021-02-01 2021-03-04       4               39
10       Bob 2021-02-15 2021-04-05      23               50
11       Bob 2021-03-01 2021-03-24      12               42
12       Bob 2021-03-15 2021-04-15       7               27
13       Bob 2021-04-01 2021-05-01      20               20