I have a list of dictionaries like that :
a = [
{'user_id':'111','clean_label':'VIR SEPA'},
{'user_id':'112','clean_label':'VIR SEPA'},
{'user_id':'111','clean_label':'VIR SEPA'},
]
and I want that :
a = [
[
{'user_id':'111','clean_label':'VIR SEPA'},
{'user_id':'111','clean_label':'VIR SEPA'}
],
[
{'user_id':'112','clean_label':'VIR SEPA'}
]
]
I tried with sorted and groupby from itertools like that :
sorted(a,key=lambda x: (x['user_id'],x['clean_label']))
[ [tr for tr in tr_per_user_id_clean_label] for key, tr_per_user_id_clean_label in itertools.groupby(a, key=lambda x: (x['user_id'], x['clean_label'])) ]
but I get that :
[[{'user_id': '111', 'clean_label': 'VIR SEPA'}],
[{'user_id': '112', 'clean_label': 'VIR SEPA'}],
[{'user_id': '111', 'clean_label': 'VIR SEPA'}]]
Can someone help me ??
*Edit : when I sort a :
[{'user_id': '111', 'clean_label': 'VIR SEPA'},
{'user_id': '111', 'clean_label': 'VIR SEPA'},
{'user_id': '112', 'clean_label': 'VIR SEPA'}]
CodePudding user response:
sorted()
returns a new list and does not change the order of the existing list. You want either a.sort()
or groupby(sorted(a, key=...), key=...)
.
Although, why bother sorting at all? You could use a dict as an accumulator, like in mozway's answer.
CodePudding user response:
itertools.groupby
is not really the ideal tool for this.
You can achieve your goal with O(n) complexity using a defaultdict
(vs O(n log n) with groupby
as you need to sort):
from collections import defaultdict
dd = defaultdict(list)
for d in a:
dd[(d['user_id'], d['clean_label'])].append(d)
out = list(dd.values())
alternative with setdefault
:
dd = {}
for d in a:
dd.setdefault((d['user_id'], d['clean_label']), []).append(d)
out = list(dd.values())
output:
[[{'user_id': '111', 'clean_label': 'VIR SEPA'},
{'user_id': '111', 'clean_label': 'VIR SEPA'}],
[{'user_id': '112', 'clean_label': 'VIR SEPA'}]]
If the output needs to be sorted by user_id:
out = sorted(dd.values(),
key=lambda x: (int(x[0]['user_id']), int(x[0]['clean_label'])))