pandas - calculation of percent change for a sub-group within a group-CodePudding

I have a requirement where I need to calculate the percent change for a order group. What I have done so far works well if there are equal number of rows for sub group within primary group. I need to take into consideration the quantity as well.

time            txn_type symbol qty price
27/12/21 10:32  BUY      XYZ    1   4054.5
27/12/21 10:26  SELL     XYZ    2   4053.65
27/12/21 10:00  BUY      XYZ    1   4072.25
27/12/21 09:56  BUY      XYZ    1   4045.15
27/12/21 09:50  SELL     XYZ    1   4034.25
27/12/21 09:40  BUY      XYZ    1   4006
27/12/21 09:20  SELL     XYZ    1   3978.1
27/12/21 10:55  SELL     MNO    1   1714.95
27/12/21 10:25  BUY      PQR    1   768.7
27/12/21 10:05  SELL     PQR    1   765.05
27/12/21 09:57  SELL     PQR    1   764
27/12/21 09:40  BUY      PQR    1   769
27/12/21 09:28  SELL     PQR    1   765.8
27/12/21 09:20  BUY      PQR    1   768.95
27/12/21 09:20  BUY      MNO    1   1703.55


symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
                            'symbol': 'first',
                            'txn_type': 'first',
                            'price': np.sum
                        })
symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
                            lambda g: round(((g.shift(-1) - g) / g) * 100, 2))
symbol_percent_df.reset_index(inplace=True)
symbol_percent_df = symbol_percent_df[symbol_percent_df['txn_type'] == "BUY"]
symbol_percent_df.sort_values(by=['price'], ascending=False, inplace=True)
symbol_pct_dict: dict = symbol_percent_df.set_index('symbol')['price'].to_dict()

Above code works well for MNO, PQR but gives incorrect result for XYZ as qty varied for one row at 10:26.

What I need is symbol wise percent change in a dictionary.

CodePudding user response：

Solution is repeat rows by qty, so get double 1 row here:

order_df = order_df.loc[order_df.index.repeat(order_df['qty'])]

print (order_df)
              time txn_type symbol  qty    price
0   27/12/21 10:32      BUY    XYZ    1  4054.50
1   27/12/21 10:26     SELL    XYZ    2  4053.65
1   27/12/21 10:26     SELL    XYZ    2  4053.65
2   27/12/21 10:00      BUY    XYZ    1  4072.25
3   27/12/21 09:56      BUY    XYZ    1  4045.15
4   27/12/21 09:50     SELL    XYZ    1  4034.25
5   27/12/21 09:40      BUY    XYZ    1  4006.00
6   27/12/21 09:20     SELL    XYZ    1  3978.10
7   27/12/21 10:55     SELL    MNO    1  1714.95
8   27/12/21 10:25      BUY    PQR    1   768.70
9   27/12/21 10:05     SELL    PQR    1   765.05
10  27/12/21 09:57     SELL    PQR    1   764.00
11  27/12/21 09:40      BUY    PQR    1   769.00
12  27/12/21 09:28     SELL    PQR    1   765.80
13  27/12/21 09:20      BUY    PQR    1   768.95
14  27/12/21 09:20      BUY    MNO    1  1703.55

symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
                            'symbol': 'first',
                            'txn_type': 'first',
                            'price': np.sum
                        })

print (symbol_orders_df)
                symbol txn_type     price
symbol txn_type                          
MNO    BUY         MNO      BUY   1703.55
       SELL        MNO     SELL   1714.95
PQR    BUY         PQR      BUY   2306.65
       SELL        PQR     SELL   2294.85
XYZ    BUY         XYZ      BUY  16177.90
       SELL        XYZ     SELL  16119.65

symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
                            lambda g: round(((g.shift(-1) - g) / g) * 100, 2))

print (symbol_percent_df)
                 price
symbol txn_type       
MNO    BUY        0.67
       SELL        NaN
PQR    BUY       -0.51
       SELL        NaN
XYZ    BUY       -0.36
       SELL        NaN

Another idea is multiple columns:

order_df['price'] = order_df['price'].mul(order_df['qty'])

print (order_df)
              time txn_type symbol  qty    price
0   27/12/21 10:32      BUY    XYZ    1  4054.50
1   27/12/21 10:26     SELL    XYZ    2  8107.30
2   27/12/21 10:00      BUY    XYZ    1  4072.25
3   27/12/21 09:56      BUY    XYZ    1  4045.15
4   27/12/21 09:50     SELL    XYZ    1  4034.25
5   27/12/21 09:40      BUY    XYZ    1  4006.00
6   27/12/21 09:20     SELL    XYZ    1  3978.10
7   27/12/21 10:55     SELL    MNO    1  1714.95
8   27/12/21 10:25      BUY    PQR    1   768.70
9   27/12/21 10:05     SELL    PQR    1   765.05
10  27/12/21 09:57     SELL    PQR    1   764.00
11  27/12/21 09:40      BUY    PQR    1   769.00
12  27/12/21 09:28     SELL    PQR    1   765.80
13  27/12/21 09:20      BUY    PQR    1   768.95
14  27/12/21 09:20      BUY    MNO    1  1703.55

symbol_orders_df = order_df.groupby(['symbol', 'txn_type'])['price'].sum()
print (symbol_orders_df)
symbol  txn_type
MNO     BUY          1703.55
        SELL         1714.95
PQR     BUY          2306.65
        SELL         2294.85
XYZ     BUY         16177.90
        SELL        16119.65
Name: price, dtype: float64

symbol_percent_df = (symbol_orders_df.groupby(level=[0]).shift(-1)
                                 .sub(symbol_orders_df)
                                 .div(symbol_orders_df)
                                 .mul(100)
                                 .round(2)
                                 .to_frame())

print (symbol_percent_df)
                 price
symbol txn_type       
MNO    BUY        0.67
       SELL        NaN
PQR    BUY       -0.51
       SELL        NaN
XYZ    BUY       -0.36
       SELL        NaN