Home > Blockchain >  Find overlapping or inserted rows in python
Find overlapping or inserted rows in python

Time:09-12

I have a simple DataFrame:

    start  end
0      30   40
1      45   55
2      50   60
3      53   64
4      65   70
5      75   80
6      77   85
7      80   83
8      90  120
9      95  100
10    105  110

You may notice some rows are part of another row, or they overlap with them. I want to straighten up this DataFrame to get this:

   start  end
0     30   40
1     45   64
2     65   70
3     75   85
4     90  120

I drew a picture for a better explanation (hope, it helps):enter image description here

CodePudding user response:

Use custom function with Dataframe constructor:

#https://stackoverflow.com/a/5679899/2901002
def merge(times):
    saved = list(times[0])
    for st, en in sorted([sorted(t) for t in times]):
        if st <= saved[1]:
            saved[1] = max(saved[1], en)
        else:
            yield tuple(saved)
            saved[0] = st
            saved[1] = en
    yield tuple(saved)

df1 = pd.DataFrame(merge(df[['start','end']].to_numpy()), columns=['start','end'])
print (df1)
   start  end
0     30   40
1     45   64
2     65   70
3     75   85
4     90  120
  • Related