Home > Software design >  group the data as continuous block pandas
group the data as continuous block pandas

Time:06-10

I had a panda DataFrame looks like

   start end
0  10    20
1  30    35
2  20    25
3  35    40
4  40    45
5  60    70
6  70    80

Expecting Output like

[{"start":10,"end":25},{"start":30,"end":45},{"start":60,"end":80}]

The output is calculated based on start from min of start and check the end value have any other start and take it and go continue check

CodePudding user response:

You can compute a group on the sorted items and convert to_dict:

df2 = df.sort_values(by=['start', 'end'])

group = df2['start'].ne(df2['end'].shift()).cumsum()
# or if overlapping
# group = df2['start'].gt(df2['end'].shift()).cumsum()

out = df2.groupby(group).agg({'start': 'min', 'end': 'max'}).to_dict('records')

output:

[{'start': 10, 'end': 25}, {'start': 30, 'end': 45}, {'start': 60, 'end': 80}]

overlapping

df = pd.DataFrame({'start': [10, 30, 18, 35, 40, 60, 70], 'end': [20, 35, 25, 40, 45, 70, 80]})

df2 = df.sort_values(by=['start', 'end'])
group = df2['start'].gt(df2['end'].shift()).cumsum()
df2.groupby(group).agg({'start': 'min', 'end': 'max'}).to_dict('records')
  • Related