group the data as continuous block pandas-CodePudding

I had a panda DataFrame looks like

   start end
0  10    20
1  30    35
2  20    25
3  35    40
4  40    45
5  60    70
6  70    80

Expecting Output like

[{"start":10,"end":25},{"start":30,"end":45},{"start":60,"end":80}]

The output is calculated based on start from min of start and check the end value have any other start and take it and go continue check

CodePudding user response：

You can compute a group on the sorted items and convert to_dict:

df2 = df.sort_values(by=['start', 'end'])

group = df2['start'].ne(df2['end'].shift()).cumsum()
# or if overlapping
# group = df2['start'].gt(df2['end'].shift()).cumsum()

out = df2.groupby(group).agg({'start': 'min', 'end': 'max'}).to_dict('records')

output:

[{'start': 10, 'end': 25}, {'start': 30, 'end': 45}, {'start': 60, 'end': 80}]

overlapping

df = pd.DataFrame({'start': [10, 30, 18, 35, 40, 60, 70], 'end': [20, 35, 25, 40, 45, 70, 80]})

df2 = df.sort_values(by=['start', 'end'])
group = df2['start'].gt(df2['end'].shift()).cumsum()
df2.groupby(group).agg({'start': 'min', 'end': 'max'}).to_dict('records')