Home > database >  How to find the number of previous rows in which the range of values in two columns intersects with
How to find the number of previous rows in which the range of values in two columns intersects with

Time:12-15

I have I have data frame of rows with the values of open_time and close_time. I need to create column open_lvl_value where, for each row, will be shown sum of previous rows with intersected range of values. Probably I didn't explain it clear so I attached expectation and Gantt chart.

import pandas as pd

data = [[40, 110],[50, 110],[70, 110],[90, 110],
        [100, 110],[40, 70],[50, 70],[50, 70],
        [60, 70],[60, 70],[100, 110],[130, 180],
        [150, 180],[150, 180]]
df = pd.DataFrame(data, columns=['open_time','close_time'])
print(df)
    open_time  close_time
0          40         110
1          50         110
2          70         110
3          90         110
4         100         110
5          40          70
6          50          70
7          50          70
8          60          70
9          60          70
10        100         110
11        130         180
12        150         180
13        150         180

expected result:

    open_time  close_time  open_lvl_value
0          40         110               1
1          50         110               2
2          70         110               3
3          90         110               4
4         100         110               5
5          40          70               4
6          50          70               5
7          50          70               6
8          60          70               7
9          60          70               8
10        100         110               5
11        130         180               1
12        150         180               2
13        150         180               3

Gantt chart

CodePudding user response:

You can try this:

import pandas as pd

data = [[40, 110],[50, 110],[70, 110],[90, 110],
        [100, 110],[40, 70],[50, 70],[50, 70],
        [60, 70],[60, 70],[100, 110],[130, 180],
        [150, 180],[150, 180]]
df = pd.DataFrame(data, columns=['open_time','close_time'])


def foo(df):
    df['open_lvl_value'] = 0
    for row in df.itertuples():
        start = row.open_time
        end = row.close_time
        df_copy = df.loc[:row.Index]
        s1 = df_copy[(df_copy.close_time >= end) & (df_copy.open_time <= end)].shape[0]
        s2 = df_copy[(df_copy.open_time <= start) & df_copy.close_time >= start].shape[0]
        df.loc[row.Index, 'open_lvl_value']  = max(s1, s2)
    return df


print(foo(df))

Out:
    open_time  close_time  open_lvl_value
0          40         110               1
1          50         110               2
2          70         110               3
3          90         110               4
4         100         110               5
5          40          70               4
6          50          70               5
7          50          70               6
8          60          70               7
9          60          70               8
10        100         110               6
11        130         180               1
12        150         180               2
13        150         180               3
  • Related