I have a requirement where I need to calculate the percent change for a order group. What I have done so far works well if there are equal number of rows for sub group within primary group. I need to take into consideration the quantity as well.
time txn_type symbol qty price
27/12/21 10:32 BUY XYZ 1 4054.5
27/12/21 10:26 SELL XYZ 2 4053.65
27/12/21 10:00 BUY XYZ 1 4072.25
27/12/21 09:56 BUY XYZ 1 4045.15
27/12/21 09:50 SELL XYZ 1 4034.25
27/12/21 09:40 BUY XYZ 1 4006
27/12/21 09:20 SELL XYZ 1 3978.1
27/12/21 10:55 SELL MNO 1 1714.95
27/12/21 10:25 BUY PQR 1 768.7
27/12/21 10:05 SELL PQR 1 765.05
27/12/21 09:57 SELL PQR 1 764
27/12/21 09:40 BUY PQR 1 769
27/12/21 09:28 SELL PQR 1 765.8
27/12/21 09:20 BUY PQR 1 768.95
27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
'symbol': 'first',
'txn_type': 'first',
'price': np.sum
})
symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
lambda g: round(((g.shift(-1) - g) / g) * 100, 2))
symbol_percent_df.reset_index(inplace=True)
symbol_percent_df = symbol_percent_df[symbol_percent_df['txn_type'] == "BUY"]
symbol_percent_df.sort_values(by=['price'], ascending=False, inplace=True)
symbol_pct_dict: dict = symbol_percent_df.set_index('symbol')['price'].to_dict()
Above code works well for MNO, PQR but gives incorrect result for XYZ as qty varied for one row at 10:26.
What I need is symbol wise percent change in a dictionary.
CodePudding user response:
Solution is repeat rows by qty
, so get double 1
row here:
order_df = order_df.loc[order_df.index.repeat(order_df['qty'])]
print (order_df)
time txn_type symbol qty price
0 27/12/21 10:32 BUY XYZ 1 4054.50
1 27/12/21 10:26 SELL XYZ 2 4053.65
1 27/12/21 10:26 SELL XYZ 2 4053.65
2 27/12/21 10:00 BUY XYZ 1 4072.25
3 27/12/21 09:56 BUY XYZ 1 4045.15
4 27/12/21 09:50 SELL XYZ 1 4034.25
5 27/12/21 09:40 BUY XYZ 1 4006.00
6 27/12/21 09:20 SELL XYZ 1 3978.10
7 27/12/21 10:55 SELL MNO 1 1714.95
8 27/12/21 10:25 BUY PQR 1 768.70
9 27/12/21 10:05 SELL PQR 1 765.05
10 27/12/21 09:57 SELL PQR 1 764.00
11 27/12/21 09:40 BUY PQR 1 769.00
12 27/12/21 09:28 SELL PQR 1 765.80
13 27/12/21 09:20 BUY PQR 1 768.95
14 27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
'symbol': 'first',
'txn_type': 'first',
'price': np.sum
})
print (symbol_orders_df)
symbol txn_type price
symbol txn_type
MNO BUY MNO BUY 1703.55
SELL MNO SELL 1714.95
PQR BUY PQR BUY 2306.65
SELL PQR SELL 2294.85
XYZ BUY XYZ BUY 16177.90
SELL XYZ SELL 16119.65
symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
lambda g: round(((g.shift(-1) - g) / g) * 100, 2))
print (symbol_percent_df)
price
symbol txn_type
MNO BUY 0.67
SELL NaN
PQR BUY -0.51
SELL NaN
XYZ BUY -0.36
SELL NaN
Another idea is multiple columns:
order_df['price'] = order_df['price'].mul(order_df['qty'])
print (order_df)
time txn_type symbol qty price
0 27/12/21 10:32 BUY XYZ 1 4054.50
1 27/12/21 10:26 SELL XYZ 2 8107.30
2 27/12/21 10:00 BUY XYZ 1 4072.25
3 27/12/21 09:56 BUY XYZ 1 4045.15
4 27/12/21 09:50 SELL XYZ 1 4034.25
5 27/12/21 09:40 BUY XYZ 1 4006.00
6 27/12/21 09:20 SELL XYZ 1 3978.10
7 27/12/21 10:55 SELL MNO 1 1714.95
8 27/12/21 10:25 BUY PQR 1 768.70
9 27/12/21 10:05 SELL PQR 1 765.05
10 27/12/21 09:57 SELL PQR 1 764.00
11 27/12/21 09:40 BUY PQR 1 769.00
12 27/12/21 09:28 SELL PQR 1 765.80
13 27/12/21 09:20 BUY PQR 1 768.95
14 27/12/21 09:20 BUY MNO 1 1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type'])['price'].sum()
print (symbol_orders_df)
symbol txn_type
MNO BUY 1703.55
SELL 1714.95
PQR BUY 2306.65
SELL 2294.85
XYZ BUY 16177.90
SELL 16119.65
Name: price, dtype: float64
symbol_percent_df = (symbol_orders_df.groupby(level=[0]).shift(-1)
.sub(symbol_orders_df)
.div(symbol_orders_df)
.mul(100)
.round(2)
.to_frame())
print (symbol_percent_df)
price
symbol txn_type
MNO BUY 0.67
SELL NaN
PQR BUY -0.51
SELL NaN
XYZ BUY -0.36
SELL NaN