Home > OS >  Sum values based on two conditions
Sum values based on two conditions

Time:03-31

I have the following DataFrame:

df = pd.DataFrame()
df['account'] = [1,1,1,1,1,2,2,2,2,2]
df['type'] = ['deposit', 'deposit', 'deposit', 'trade', 'trade','deposit', 'deposit', 'trade', 'trade', 'trade']
df['asset'] = ['A','A','B','A','B','A','C','A','A','C']
df['amount'] = [100,110,200,150,230,120,310,130,125,320]
df['price'] = [10,8, 20, 12, 23,15,5, 18, 15, 7]
df:
    account type    asset   amount  price
0   1       deposit A       100     10
1   1       deposit A       110     8
2   1       deposit B       200     20
3   1       trade   A       150     12
4   1       trade   B       230     23
5   2       deposit A       120     15
6   2       deposit C       310     5
7   2       trade   A       130     18
8   2       trade   A       125     15
9   2       trade   C       320     7

I want to create an extra column called holdPortfolio which is basically a buy and hold strategy. So the final DataFrame should look like this:

df_final:
    account type    asset   amount  price  holdPortfolio
0   1       deposit A       100     10     1000
1   1       deposit A       110     8      1880
2   1       deposit B       200     20     4000
3   1       trade   A       150     12     2520
4   1       trade   B       230     23     4600
5   2       deposit A       120     15     1800
6   2       deposit C       310     5      1550
7   2       trade   A       130     18     2160
8   2       trade   A       125     15     1800
9   2       trade   C       320     7      2170

Explanation: In index 0, the 1000 comes from 100*10.

In index 1, the 1880 comes from (100*10) (110*8). I sum them up because they are both asset A.

In index 2, the 4000 comes from 200*20.

In index 3, the 2520 comes from (100 110)*12. This is the total amount of asset A that I have deposited multiplied by the price at that index.

In index 4, the 4600 comes from 200*23. This is the total amount of asset B that I have deposited multiplied by the price at that index. and so on...

Is there a way to get these results without writing multiple for loops for each account and each asset? I basically want a cumulative sum of my deposits for each asset in each account, multiplied by the current price of that asset.

CodePudding user response:

Compute 2 Series, one for the cumulated deposits and one for the trades, then combine:

s = df['amount'].mask(df['type'].eq('trade'))

s2 = s.fillna(0).groupby([df['account'], df['asset']]).cumsum()

df['holdPortfolio'] = s.mul(df['price']).fillna(s2.mul(df['price']))

Output:

   account     type asset  amount  price  holdPortfolio
0        1  deposit     A     100     10           1000
1        1  deposit     A     110      8            880
2        1  deposit     B     200     20           4000
3        1    trade     A     150     12           2520
4        1    trade     B     230     23           4600
5        2  deposit     A     120     15           1800
6        2  deposit     C     310      5           1550
7        2    trade     A     130     18           2160
8        2    trade     A     125     15           1800
9        2    trade     C     320      7           2170
  • Related