I have a list-of-lists, Corp
. It looks like this:
list-of-lists = [
['the\tthe Def Det _1', '_1']
['dogs\tdog N Sg @SUBJ> _1', '_1']
['bark\tmiskan V 3Sg @PRED _1', '_1']
['.\t? CLB _1', '_1']
['it\tit Pron 3Sg @SUBJ> _2', '_2']
['scared\tscare V Pst @PRED _2', '_2']
['me\tI Pron 1Sg @OBJ< _2', '_2']
...
]
What I want to do is group these so all items with the same sentence index (the final item in each list is grouped, like so:
[
[['the\tthe Def Det _1', '_1'],
['dogs\tdog N Sg @SUBJ> _1', '_1'],
['bark\tmiskan V 3Sg @PRED _1', '_1'],
['.\t? CLB _1', '_1']]
[['it\tit Pron 3Sg @SUBJ> _2', '_2'],
['scared\tscare V Pst @PRED _2', '_2'],
['me\tI Pron 1Sg @OBJ< _2', '_2']]
...
]
I have tried using itemgetter
and groupby
from the operator and itertools modules (respectively). The issue with this is that they see, to reorder the embedded new list-of-lists:
groupedcorp = [[x for x,y in g]
for k,g in groupby(splitcorp,key=itemgetter(1))]
[
[['.\t? CLB _1', '_1']],
['dogs\tdog N Sg @SUBJ> _1', '_1'],
['bark\tmiskan V 3Sg @PRED _1', '_1'],
['the\tthe Def Det _1', '_1']]
[['scared\tscare V Pst @PRED _2', '_2'],
[['it\tit Pron 3Sg @SUBJ> _2', '_2'],
['me\tI Pron 1Sg @OBJ< _2', '_2']]
...
]
I'm fine with the final sentence index being eaten up (the second item in each atomic list.
Any help would be appreciated.
CodePudding user response:
As far as I understand the question (not 100% sure..). The below is the grouping you are looking for.
from collections import defaultdict
data = defaultdict(list)
lists = [['the\tthe Def Det _1', '_1'],
['dogs\tdog N Sg @SUBJ> _1', '_1'],
['bark\tmiskan V 3Sg @PRED _1', '_1'],
['.\t? CLB _1', '_1'],
['it\tit Pron 3Sg @SUBJ> _2', '_2'],
['scared\tscare V Pst @PRED _2', '_2'],
['me\tI Pron 1Sg @OBJ< _2', '_2']
]
for lst in lists:
data[lst[-1]].append(lst)
for k,v in data.items():
print(f'{k} -> {v}')
output
_1 -> [['the\tthe Def Det _1', '_1'], ['dogs\tdog N Sg @SUBJ> _1', '_1'], ['bark\tmiskan V 3Sg @PRED _1', '_1'], ['.\t? CLB _1', '_1']]
_2 -> [['it\tit Pron 3Sg @SUBJ> _2', '_2'], ['scared\tscare V Pst @PRED _2', '_2'], ['me\tI Pron 1Sg @OBJ< _2', '_2']]