Generate Boolean Matrix of overlapping intervals-CodePudding

I have a dataframe where two columns represent the start and end points of intervals on a real number line. I want to generate a third column as a list of the indices of rows which said row has any overlap with. I'm having difficulty creating a inequality boolean matrix for this natively in pandas. I assume logic like this s1<=e2 and e1>=s2 will do the trick, but I don't know how to effectively broadcast it.

As a toy example I'm hoping for a simple way to at least generate a 5x5 boolean matrix (with all True down the diagonal) given this dataframe:

import pandas as pd
intervals_df = pd.DataFrame({"Starts":[0,1,5,10,15,20],"Ends":[4,2,9,14,19,24]})

Starts  Ends
0   0   4
1   1   2
2   5   9
3   10  14
4   15  19
5   20  24

CodePudding user response：

The condition for the two intervals (s1,e1) and (s2,e2) to intersect is max(s1,s2) <= min(e1,e2). So you can do a cross merge (this is the broadcast), calculate the condition, the pivot:

d = (intervals_df.reset_index()
       .merge(intervals_df.reset_index(), how='cross')
       .assign(cond=lambda x: x.filter(like='Starts').max(axis=1) <= x.filter(like='Ends').min(axis=1))
        .pivot('index_x', 'index_y', 'cond')
    )

You would get:

index_y      0      1      2      3      4      5
index_x                                          
0         True   True  False  False  False  False
1         True   True  False  False  False  False
2        False  False   True  False  False  False
3        False  False  False   True  False  False
4        False  False  False  False   True  False
5        False  False  False  False  False   True

Or you can make do with numpy's broadcasting:

starts = intervals_df[['Starts']].to_numpy()
ends = intervals_df[['Ends']].to_numpy()

np.maximum(starts, starts.T) <= np.minimum(ends, ends.T)

Output:

array([[ True,  True, False, False, False, False],
       [ True,  True, False, False, False, False],
       [False, False,  True, False, False, False],
       [False, False, False,  True, False, False],
       [False, False, False, False,  True, False],
       [False, False, False, False, False,  True]])