Home > Back-end >  How to effectively loop within groups in pandas?
How to effectively loop within groups in pandas?

Time:03-23

I have a table like this

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({'date':[1,2,3,4,5,6,7,8,9,10] ,'high':[10,9,8,8,7,6,7,8,9,10],'low':[9,7,6,5,2,1,2,1,8,9],'stock':['A']*5   ['B']*5})
date high low stock
1 10 9 A
2 9 7 A
3 8 6 A
4 8 5 A
5 7 2 A
6 6 1 B
7 7 2 B
8 8 1 B
9 9 8 B
10 10 9 B

For each day of each stock, I would like to know what is the max difference between “high” of today and low (after or today). For example, on date 1, stock A high price is $10. I look at date 1-5 and find maximum difference between high and low is on date 5. Result will be 10-2=8 for date 1. On date 2, I should only look at date 2 afterwards for "low".

Results:

date high low stock diff_high_low
1 10 9 A 8
2 9 7 A 7
3 8 6 A 6
4 8 5 A 6
5 7 2 A 5
6 6 1 B 5
7 7 2 B 6
8 8 1 B 7
9 9 8 B 1
10 10 9 B 1

I am currently using a for-loop and it works. It is really slow on my 1 million rows table. Is there a better way to do it?

My current method:

diff_high_low=[]
for gname, g in df.groupby('stock'):
    rows = g.shape[0]
    for i in range(0,rows):
            diff_high_low.append(max( g['high'].iloc[i] - g['low'].iloc[i:rows,]))
df['diff_high_low'] = diff_high_low

CodePudding user response:

We need groupby with cummin

df['diff_high_low'] = df['high'] - df.iloc[::-1].groupby('stock')['low'].cummin()
Out[273]: 
0    8
1    7
2    6
3    6
4    5
5    5
6    6
7    7
8    1
9    1
dtype: int64
  • Related