Home > Back-end >  Group lists by final item, but keep order
Group lists by final item, but keep order

Time:09-21

I have a list-of-lists, Corp. It looks like this:

list-of-lists = [
['the\tthe Def Det  _1', '_1']
['dogs\tdog N Sg @SUBJ>  _1', '_1']
['bark\tmiskan V 3Sg @PRED  _1', '_1']
['.\t? CLB  _1', '_1']
['it\tit Pron 3Sg @SUBJ>  _2', '_2']
['scared\tscare V Pst @PRED  _2', '_2']
['me\tI Pron 1Sg @OBJ<  _2', '_2']
...
]

What I want to do is group these so all items with the same sentence index (the final item in each list is grouped, like so:

[
[['the\tthe Def Det  _1', '_1'],
['dogs\tdog N Sg @SUBJ>  _1', '_1'],
['bark\tmiskan V 3Sg @PRED  _1', '_1'],
['.\t? CLB  _1', '_1']]

[['it\tit Pron 3Sg @SUBJ>  _2', '_2'],
['scared\tscare V Pst @PRED  _2', '_2'],
['me\tI Pron 1Sg @OBJ<  _2', '_2']]
...
]

I have tried using itemgetter and groupby from the operator and itertools modules (respectively). The issue with this is that they see, to reorder the embedded new list-of-lists:

groupedcorp = [[x for x,y in g]
       for k,g in  groupby(splitcorp,key=itemgetter(1))]
[
[['.\t? CLB  _1', '_1']],
['dogs\tdog N Sg @SUBJ>  _1', '_1'],
['bark\tmiskan V 3Sg @PRED  _1', '_1'],
['the\tthe Def Det  _1', '_1']]

[['scared\tscare V Pst @PRED  _2', '_2'],
[['it\tit Pron 3Sg @SUBJ>  _2', '_2'],
['me\tI Pron 1Sg @OBJ<  _2', '_2']]
...
]

I'm fine with the final sentence index being eaten up (the second item in each atomic list.

Any help would be appreciated.

CodePudding user response:

As far as I understand the question (not 100% sure..). The below is the grouping you are looking for.

from collections import defaultdict

data = defaultdict(list)

lists = [['the\tthe Def Det  _1', '_1'],
         ['dogs\tdog N Sg @SUBJ>  _1', '_1'],
         ['bark\tmiskan V 3Sg @PRED  _1', '_1'],
         ['.\t? CLB  _1', '_1'],
         ['it\tit Pron 3Sg @SUBJ>  _2', '_2'],
         ['scared\tscare V Pst @PRED  _2', '_2'],
         ['me\tI Pron 1Sg @OBJ<  _2', '_2']

         ]
for lst in lists:
    data[lst[-1]].append(lst)
for k,v in data.items():
    print(f'{k} -> {v}')

output

_1 -> [['the\tthe Def Det  _1', '_1'], ['dogs\tdog N Sg @SUBJ>  _1', '_1'], ['bark\tmiskan V 3Sg @PRED  _1', '_1'], ['.\t? CLB  _1', '_1']]
_2 -> [['it\tit Pron 3Sg @SUBJ>  _2', '_2'], ['scared\tscare V Pst @PRED  _2', '_2'], ['me\tI Pron 1Sg @OBJ<  _2', '_2']]
  • Related