Home > Software engineering >  Python: sum values in column where condition is met
Python: sum values in column where condition is met

Time:03-11

I have a DataFrame like this:

import pandas as pd

df=pd.DataFrame()
df['exchange'] = [1, 1, 1, 2, 3]
df['type'] = ['deposit', 'deposit', 'trade', 'deposit', 'deposit']
df['value'] = [10, 10, '30', '40', '100']

which looks like:

    exchange     type       value
 0  1            deposit    10
 1  1            deposit    10
 2  1            trade      30
 3  2            deposit    40
 4  3            deposit    100

I want to add the elements in the "value" column where "type"='deposit' based on the "exchange" and forward-fill to get something like this:

    exchange     type       value   balance
 0  1            deposit    10      10
 1  1            deposit    10      20
 2  1            trade      30      20
 3  2            deposit    40      40
 4  3            deposit    100     100

where "balance" is the sum of deposits filtered by "exchange".

Is there a way to do this pythonically without for loops/if statements?

CodePudding user response:

You can first group by "exchange", then apply np.cumsum and finally assign the result where type is "deposit".

import pandas as pd
import numpy as np

df.loc[df["type"]=="deposit", "balance"] = df.loc[df["type"]=="deposit"].groupby("exchange", sort=False)["value"].apply(np.cumsum)

Finally you can fill missing value with the forward-fill as you have mentioned.

df = df.fillna(method='ffill')

CodePudding user response:

You could use groupby cumsum to fill in "balance" column; this fills in for "trade"s as well, so you mask the result depending on with type is "deposit" or not and forward fill using ffill if it's not "deposit":

df['value'] = df['value'].astype(int)
df['balance'] = df['value'].mask(df['type']!='deposit').groupby(df['exchange']).cumsum().ffill()

Output:

   exchange     type  value  balance
0         1  deposit     10     10.0
1         1  deposit     10     20.0
2         1    trade     30     20.0
3         2  deposit     40     40.0
4         3  deposit    100    100.0

CodePudding user response:

You use where to NaN non deposit rows and then use an expanding sum, within each exhange group as it considers NaN 0 when summing so it winds up forward filling just as you want.

df['balance'] = (df['value'].where(df['type'].eq('deposit'))
                   .groupby(df['exchange'])
                   .expanding().sum()
                   .reset_index(0, drop=True))

   exchange     type  value  balance
0         1  deposit     10     10.0
1         1  deposit     10     20.0
2         1    trade     30     20.0
3         2  deposit     40     40.0
4         3  deposit    100    100.0
  • Related