Home > Mobile >  Avoiding loops in python/pandas
Avoiding loops in python/pandas

Time:10-06

I can do python/pandas to basic stuff, but I still struggle with the "no loops necessary" world of pandas. I tend to fall back to converting to lists and doing loops like in VBA and then just bring those list back to dfs. I know there is a simpler way, but I can't figure it out.

I simple example is just a very basic strategy of creating a signal of -1 if a series is above 70 and keep it -1 until the series breaks below 30 when the signal changes to 1 and keep this until a value above 70 again and so on.

I can do this via simple list looping, but I know this is far from "Pythonic"! Can anyone help "translating" this to some nicer code without loops?

#rsi_list is just a list from a df column of numbers. Simple example:
rsi={'rsi':[35, 45, 75, 56, 34, 29, 26, 34, 67. 78]}
rsi=pd.DataFrame(rsi)
rsi_list=rsi['rsi'].tolist()

signal_list=[]
hasShort=0
hasLong=0
for i in range(len(rsi_list)-1):           
    if rsi_list[i] >= 70 or hasShort==1:
        signal_list.append(-1)
    
        if rsi_list[i 1] >= 30:
            hasShort=1
        else:
            hasShort=0
    
    elif rsi_list[i] <= 30 or hasLong==1:
        signal_list.append(1)
        
        if rsi_list[i 1] <= 70:
            hasLong=1
        else:
            hasLong=0
    else:
        signal_list.append(0)

#last part just for the list to be the same lenght of the original df as I put it back as a column    
if rsi_list[-1]>=70:
    signal_list.append(-1)
else:
    signal_list.append(1)

CodePudding user response:

First clip the values to 30 in lower and 70 in upper bound, use where to change to nan all the values that are not 30 or 70. replace by 1 and -1 and propagate these values with ffill. fillna with 0 the values before the first 30 or 70.

rsi['rsi_cut'] = (
    rsi['rsi'].clip(lower=30,upper=70)
       .where(lambda x: x.isin([30,70]))
       .replace({30:1, 70:-1})
       .ffill()
       .fillna(0)
)
print(rsi)
   rsi  rsi_cut
0   35      0.0
1   45      0.0
2   75     -1.0
3   56     -1.0
4   34     -1.0
5   29      1.0
6   26      1.0
7   34      1.0
8   67      1.0
9   78     -1.0

Edit: maybe a bit easier, use ge (greater or equal) and le (less or equal) and do a subtraction, then replace the 0s with the ffill method

print((rsi['rsi'].le(30).astype(int) - rsi['rsi'].gt(70))
       .replace(to_replace=0, method='ffill'))
  • Related