Groupby & Sum - Create new column with added If Condition-CodePudding

I have the below DataFrame:

ID  Start           End              Variance
1   100000          120000           20000
1   1               0                -1
1   7815.58         7815.58          0
1   5261            5261             0
1   138783.2        89969.37         -48813.83
1   2459.92         2459.92          0
2   101421.99       93387.45         -8034.54
2   940.04          940.04           0
2   63.06           63.06            0
2   2454.86         2454.86          0
2   830             830              0
2   299             299              0
2   14000           12000            2000
2   1500            500              1000

I want to create a new column, Overspend Total. But I only want to sum the values that are greater than 0. The resulting DataFrame will look like this:

ID  Start           End              Variance        Overspend Total
1   100000          120000           20000           20000
1   1               0                -1              20000
1   7815.58         7815.58          0               20000
1   5261            5261             0               20000
1   138783.2        89969.37         -48813.83       20000
1   2459.92         2459.92          0               20000
2   101421.99       93387.45         -8034.54        3000
2   940.04          940.04           0               3000
2   63.06           63.06            0               3000
2   2454.86         2454.86          0               3000
2   830             830              0               3000
2   299             299              0               3000
2   14000           12000            2000            3000
2   1500            500              1000            3000

I tried the following

df['Overspend Variance'] = df[df['Variance'] > 0].groupby(df['ID']).transform('sum')

But I'm getting the below error:

ValueError: Wrong number of items passed 8, placement implies 1

I know df['Overspend Variance'] = df['Variance'].groupby(df['ID']).transform('sum') would work without a condition, but I can't figure out how to incorporate it with an extra condition.

CodePudding user response：

We can use Series.where to replace the values that don't match the condition with NaN, then just groupby transform 'sum' since NaN values are ignored by 'sum' by default:

df['Overspend Total'] = (
    df['Variance'].where(df['Variance'] > 0).groupby(df['ID']).transform('sum')
)

Or explicitly replace with the additive identity (0) which will not affect the sum:

df['Overspend Total'] = (
    df['Variance'].where(df['Variance'] > 0, 0)
        .groupby(df['ID']).transform('sum')
)

Or with a lambda inside groupby transform:

df['Overspend Total'] = df.groupby('ID')['Variance'].transform(
    lambda s: s[s > 0].sum()
)

In any case df is:

    ID      Start        End  Variance  Overspend Total
0    1  100000.00  120000.00  20000.00          20000.0
1    1       1.00       0.00     -1.00          20000.0
2    1    7815.58    7815.58      0.00          20000.0
3    1    5261.00    5261.00      0.00          20000.0
4    1  138783.20   89969.37 -48813.83          20000.0
5    1    2459.92    2459.92      0.00          20000.0
6    2  101421.99   93387.45  -8034.54           3000.0
7    2     940.04     940.04      0.00           3000.0
8    2      63.06      63.06      0.00           3000.0
9    2    2454.86    2454.86      0.00           3000.0
10   2     830.00     830.00      0.00           3000.0
11   2     299.00     299.00      0.00           3000.0
12   2   14000.00   12000.00   2000.00           3000.0
13   2    1500.00     500.00   1000.00           3000.0

CodePudding user response：

it could be done by filter the value less than 0 than group by and reassign

df = df.join(df[df.Variance>=0].groupby("ID")["Variance"].agg(sum),  on="ID", rsuffix="total")
df.columns = ["ID", "Start", "End", "Variance", "Overspend Total"]

    ID  Start   End Variance    Overspend Total
0   1   100000.00   120000.00   20000.00    20000.0
1   1   1.00    0.00    -1.00   20000.0
2   1   7815.58 7815.58 0.00    20000.0
3   1   5261.00 5261.00 0.00    20000.0
4   1   138783.20   89969.37    -48813.83   20000.0
5   1   2459.92 2459.92 0.00    20000.0
6   2   101421.99   93387.45    -8034.54    3000.0
7   2   940.04  940.04  0.00    3000.0
8   2   63.06   63.06   0.00    3000.0
9   2   2454.86 2454.86 0.00    3000.0
10  2   830.00  830.00  0.00    3000.0
11  2   299.00  299.00  0.00    3000.0
12  2   14000.00    12000.00    2000.00 3000.0
13  2   1500.00 500.00  1000.00 3000.0