I had a panda DataFrame looks like
start end
0 10 20
1 30 35
2 20 25
3 35 40
4 40 45
5 60 70
6 70 80
Expecting Output like
[{"start":10,"end":25},{"start":30,"end":45},{"start":60,"end":80}]
The output is calculated based on start from min of start and check the end value have any other start and take it and go continue check
CodePudding user response:
You can compute a group on the sorted items and convert to_dict
:
df2 = df.sort_values(by=['start', 'end'])
group = df2['start'].ne(df2['end'].shift()).cumsum()
# or if overlapping
# group = df2['start'].gt(df2['end'].shift()).cumsum()
out = df2.groupby(group).agg({'start': 'min', 'end': 'max'}).to_dict('records')
output:
[{'start': 10, 'end': 25}, {'start': 30, 'end': 45}, {'start': 60, 'end': 80}]
overlapping
df = pd.DataFrame({'start': [10, 30, 18, 35, 40, 60, 70], 'end': [20, 35, 25, 40, 45, 70, 80]})
df2 = df.sort_values(by=['start', 'end'])
group = df2['start'].gt(df2['end'].shift()).cumsum()
df2.groupby(group).agg({'start': 'min', 'end': 'max'}).to_dict('records')