How to effectively loop within groups in pandas?-CodePudding

I have a table like this

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({'date':[1,2,3,4,5,6,7,8,9,10] ,'high':[10,9,8,8,7,6,7,8,9,10],'low':[9,7,6,5,2,1,2,1,8,9],'stock':['A']*5   ['B']*5})

date	high	low	stock
1	10	9	A
2	9	7	A
3	8	6	A
4	8	5	A
5	7	2	A
6	6	1	B
7	7	2	B
8	8	1	B
9	9	8	B
10	10	9	B

For each day of each stock, I would like to know what is the max difference between “high” of today and low (after or today). For example, on date 1, stock A high price is $10. I look at date 1-5 and find maximum difference between high and low is on date 5. Result will be 10-2=8 for date 1. On date 2, I should only look at date 2 afterwards for "low".

Results:

date	high	low	stock	diff_high_low
1	10	9	A	8
2	9	7	A	7
3	8	6	A	6
4	8	5	A	6
5	7	2	A	5
6	6	1	B	5
7	7	2	B	6
8	8	1	B	7
9	9	8	B	1
10	10	9	B	1

I am currently using a for-loop and it works. It is really slow on my 1 million rows table. Is there a better way to do it?

My current method:

diff_high_low=[]
for gname, g in df.groupby('stock'):
    rows = g.shape[0]
    for i in range(0,rows):
            diff_high_low.append(max( g['high'].iloc[i] - g['low'].iloc[i:rows,]))
df['diff_high_low'] = diff_high_low

CodePudding user response：

We need groupby with cummin

df['diff_high_low'] = df['high'] - df.iloc[::-1].groupby('stock')['low'].cummin()
Out[273]: 
0    8
1    7
2    6
3    6
4    5
5    5
6    6
7    7
8    1
9    1
dtype: int64