Home > Back-end >  Pandas recursive slicing to find min value based on condition
Pandas recursive slicing to find min value based on condition

Time:11-06

I do have python data frame as below. I am trying to slice the data frame where ever "slice" column value is 10 and then find the min of "low" column between 10 and previous non 0 value column

                   date      open      high       low     close  slice
0   2022-05-19 09:15:00  33461.00  33624.90  33403.20  33412.45      0
1   2022-05-19 09:20:00  33413.10  33450.65  33393.90  33429.10      0
2   2022-05-19 09:25:00  33433.20  33490.05  33421.95  33460.25      0
3   2022-05-19 09:30:00  33460.55  33509.40  33419.05  33489.80      0
4   2022-05-19 09:35:00  33492.20  33506.40  33450.30  33454.70     20
5   2022-05-19 09:40:00  33452.25  33452.95  33396.40  33436.15      0
6   2022-05-19 09:45:00  33434.30  33454.80  33401.35  33439.05      0
7   2022-05-19 09:50:00  33438.30  33482.85  33423.50  33477.30     10
8   2022-05-19 09:55:00  33480.60  33535.85  33462.40  33523.65     30
9   2022-05-19 10:00:00  33527.65  33527.65  33484.10  33521.40      0
10  2022-05-19 10:05:00  33519.35  33599.25  33505.95  33530.55      0
11  2022-05-19 10:10:00  33527.50  33544.20  33496.60  33538.65      0
12  2022-05-19 10:15:00  33540.15  33565.85  33522.75  33563.25      0
13  2022-05-19 10:20:00  33563.50  33582.45  33525.95  33539.25      0
14  2022-05-19 10:25:00  33537.25  33537.50  33511.80  33516.35     20
15  2022-05-19 10:30:00  33518.00  33561.80  33513.60  33528.55      0
16  2022-05-19 10:35:00  33527.80  33551.00  33527.55  33550.50     10
17  2022-05-19 10:40:00  33551.50  33573.60  33525.85  33537.45      0
18  2022-05-19 10:45:00  33534.80  33563.10  33510.75  33555.95      0
19  2022-05-19 10:50:00  33555.55  33573.45  33540.45  33541.00      0
20  2022-05-19 10:55:00  33545.40  33586.80  33542.75  33586.80     10

def buy_sell(data):
    Time = []
    SignalBuy = [] 
    BuySL = [] 
    BuyTgt = []  

    for i in range(len(data)):
                   if data["slice"][i] == 10:
                       Time.append(data["date"][i])
                       entry = data["open"][i]
                       sl = data["low"][i] - 10
                       SignalBuy.append(entry)
                       BuySL.append(sl)
                       BuyTgt.append(entry   (entry - sl) * 2)

    return pd.Series([Time, SignalBuy, BuyTgt, BuySL])

SignalDemand = pd.DataFrame()
SignalDemand["Time"], SignalDemand["Entry Price"], SignalDemand["Target"], SignalDemand["Stop Loss"]  = buy_sell(data)

print(SignalDemand.head())

This code gives me output as below.

                  Time  Entry Price    Target  Stop Loss
0  2022-05-19 09:50:00     33438.30  33487.90   33413.50
1  2022-05-19 10:35:00     33527.80  33548.30   33517.55
2  2022-05-19 10:55:00     33545.40  33570.70   33532.75

Expected output is

slice 1 - "Entry price" ( min of "open" or "close") ie 33434.30, "Stop loss" ( min of "low" - 10 ) ie ( 33396.40 - 10 = ) 33386.40

                   date      open      high       low     close  slice
4   2022-05-19 09:35:00  33492.20  33506.40  33450.30  33454.70     20
5   2022-05-19 09:40:00  33452.25  33452.95  33396.40  33436.15      0
6   2022-05-19 09:45:00  33434.30  33454.80  33401.35  33439.05      0
7   2022-05-19 09:50:00  33438.30  33482.85  33423.50  33477.30     10

Slice 2 - "Entry price" ( min of "open" or "close") ie 33516.35, "Stop loss" ( min of "low" - 10 ) ie ( 33511.80 - 10 = ) 33501.80

                   date      open      high       low     close  slice
14  2022-05-19 10:25:00  33537.25  33537.50  33511.80  33516.35     20
15  2022-05-19 10:30:00  33518.00  33561.80  33513.60  33528.55      0
16  2022-05-19 10:35:00  33527.80  33551.00  33527.55  33550.50     10

slice 3 - If previous value is same, i need to exclude the first row, ie row now 16 should be excluded.

"Entry price" ( min of "open" or "close") ie 33534.80, "Stop loss" ( min of "low" - 10 ) ie ( 33510.75 - 10 = ) 33500.75

                   date      open      high       low     close  slice
16  2022-05-19 10:35:00  33527.80  33551.00  33527.55  33550.50     10

17  2022-05-19 10:40:00  33551.50  33573.60  33525.85  33537.45      0
18  2022-05-19 10:45:00  33534.80  33563.10  33510.75  33555.95      0
19  2022-05-19 10:50:00  33555.55  33573.45  33540.45  33541.00      0
20  2022-05-19 10:55:00  33545.40  33586.80  33542.75  33586.80     10

Final output should be

                  Time  Entry Price    Target  Stop Loss
0  2022-05-19 09:50:00     33434.30  33530.10   33386.40
1  2022-05-19 10:35:00     33516.35  33545.45   33501.80
2  2022-05-19 10:55:00     33534.80  33602.90   33500.75

CodePudding user response:

I'm sure theres better ways to do this using vectorization, but the way you have it, you're not even polling the previous rows to compare it to the current value. There's multiple approaches to this problem. This one just sets the min value found between the current non 0 and the next non 0.

data['slice_min'] = None
data['slice_min_index_ref'] = None

def get_mins(data):

    min = None
    index_ref = None
    for i in range(len(data)):
        low = data.iloc[i]['low']
        if min == None: # Just setting up the first low
            min = low
        else:
            if low < min:
                #print('new low {}'.format(low))
                min = low # Setting new low
                index_ref = i 1 # in case you want to refer to the index.  1 because you used range
        if data.iloc[i]['slice'] > 0: # start over 
            data.at[i 1, 'slice_min'] = min # Set the min value only one values where a slice isn't 0. 
            data.at[i 1, 'slice_min_index_ref'] = index_ref
            index_ref = i 1
            min = low
    return data


df = get_mins(data)

df_slices = df[df['slice'] == 10] ## Just select the rows with slice 10
df_slices

Output

        date    open    high    low         close       slice   slice_min   slice_min_index_ref
0                               
7   2022-05-19 09:50:00 33438.3 33482.85    33423.50    33477.3 10  33396.4     5
16  2022-05-19 10:35:00 33527.8 33551.00    33527.55    33550.5 10  33511.8     14
20  2022-05-19 10:55:00 33545.4 33586.80    33542.75    33586.8 10  33510.75    18

CodePudding user response:

I also wanted to add you can do it this route. This one includes your end dataFrame you wanted.

df = data[data['slice'] > 0]

df2 = pd.DataFrame()

for i in range(len(df)): #
    if df.iloc[i]['slice'] == 10:
        if i == 0:
            pass # Ignore if first value in df is 10
        else:
            # querying original df using index references 
            min_row = data.iloc[df.index[i-1]:df.index[i]].min() 
            time = min_row['date']
            entry = min_row['open']
            sl = min_row['low'] - 10
            target = entry   (entry - sl) * 2 
            row = pd.Series([time, entry, target, sl])
            df2 = df2.append(row, ignore_index=True)
df2.columns = ['Time', 'Entry Price', 'Target', 'Stop Loss']
df2

Output

    Time                 Entry Price  Target    Stop Loss
0   2022-05-19 09:40:00  33434.3      33530.1   33386.40
1   2022-05-19 10:30:00  33518.0      33546.8   33503.60
2   2022-05-19 10:40:00  33534.8      33602.9   33500.75
  • Related