I have a list of dictionary elements we'll call "sessions", in each session, there is a list of "parts", a session mode and time stamps:
sessions_arr = [
{'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
{'parts': [3, 4], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706303},
{'parts': [5, 6], 'session_mode': 'Idling', 'TS_start': 1632706304, 'TS_end': 1632706400},
{'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}
]
If consecutive sessions have the same "session_mode" then the parts in the matching sessions need to merge into the same element, so I want the output to look like this:
sessions_arr = [
{'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
{'parts': [3, 4, 5, 6], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706400},
{'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}
]
Notice how the TS_end
of the 2nd index of the array was also updated accordingly. I also want to always merge back and only if the elements are consecutive.
This is what I have so far:
for i in range(len(sessions_arr_copy) - 1):
if sessions_arr[i]['session_mode'] == sessions_arr[i 1]['session_mode']:
# if they match move that session into the one before it
sessions_arr[i]['parts'].extend(sessions_arr[i 1]['parts'])
sessions_arr[i]['TS_end'] = sessions_arr[i 1]['TS_end']
sessions_arr.pop(i 1)
The issue with this implementation is that when I go to pop the element that I just merged into the previous element, it changes the size of the list that I am comparing through. I know this is an IndexError
and I understand why this error is occuring. I just want to know how to go about working around this. I would like to do this with only one for
loop as the size of the list can get pretty big but it doesn't have to be the fastest algorithm either.
CodePudding user response:
I would break this into two parts:
A function that knows how to merge a set of dicts to create a singe merged dict.
itertools.groupby
to group the list on the key you want.
Together that might look something like:
from itertools import groupby
def merge(dicts):
merged_parts = [part for line in dicts for part in line['parts']]
start = dicts[0] ['TS_start']
end = dicts[-1]['TS_end']
return {
'parts': merged_parts,
'session_mode':dicts[0]['session_mode'],
'TS_start': start,
'TS_end': end
}
sessions_arr = [
{'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
{'parts': [3, 4], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706303},
{'parts': [5, 6], 'session_mode': 'Idling', 'TS_start': 1632706303, 'TS_end': 1632706400},
{'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500},
]
[merge(list(g)) for k, g, in groupby(sessions_arr, key=lambda d: d['session_mode'])]
This will leave you with a new list looking like:
[
{'parts': [1, 2], 'session_mode': 'Driving','TS_start': 1632705871,'TS_end': 1632706202},
{'parts': [3, 4, 5, 6], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706400},
{'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}
]
If your groups are large, you could improve this by not requiring the creation of the temp list(g)
and making the merge()
function just accept an iterator.
CodePudding user response:
The problem occurs when you try to change the list during iteration. I would use a new array and form the attributes inside.
sessions_arr = [
{'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
{'parts': [3, 4], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706303},
{'parts': [5, 6], 'session_mode': 'Idling', 'TS_start': 1632706303, 'TS_end': 1632706400},
{'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}
]
helper = []
idx = 0
is_consecutive = lambda sess1, sess2: sess1['session_mode'] == sess2['session_mode']
helper.append(sessions_arr[0])
for item in sessions_arr[1:]:
if is_consecutive(helper[idx], item):
helper[idx]['parts'].extend(item['parts'])
helper[idx]['TS_end'] = item['TS_end']
else:
helper.append(item)
idx = 1
print(helper)
Output:
[{'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
{'parts': [3, 4, 5, 6], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706400},
{'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}]
>