Home > Software engineering >  Transform multiple tuples into nested dictionary
Transform multiple tuples into nested dictionary

Time:10-29

I have this set of tuples:

tokens = [('abstract', '1'), ('text', '1'), ('oie', '1'), ('idk', '1'), ('idk', '2'), ('pos', '2'), ('idk', '2'), ('idk', '2'), ('com', '2'), ('ggg', '4'), ('obama', '4'), ('joe', '4'), ('idk', '4')]

And i need to put it into a nested dictionary like this:

dict_items([('abstract', {​'1': 1}​), ('text', {​'1': 1}​), ('oie', {​'1': 1}​), ('idk', {​'1': 1, '2': 3, '4': 1}​), ('pos', {​'2': 1}​), ('com', {​'2': 1}​), ('ggg', {​'4': 1}​), ('obama', {​'4': 1}​), ('joe', {​'4': 1}​)])

this is: "term1" : {"text file number":"number of appearences"}

So the term "idk" appears in the document 1 one time, and three times on document 2 and two times on document 4

CodePudding user response:

Use:

tokens = [('abstract', '1'), ('text', '1'), ('oie', '1'), ('idk', '1'), ('idk', '2'), ('pos', '2'),
          ('idk', '2'), ('idk', '2'), ('com', '2'), ('ggg', '4'), ('obama', '4'), ('joe', '4'), ('idk', '4')]

res = {}
for o, i in tokens:
    if o not in res:
        res[o] = {}
    if i not in res[o]:
        res[o][i] = 0
    res[o][i]  = 1

print(res)

Output

{'abstract': {'1': 1}, 'text': {'1': 1}, 'oie': {'1': 1}, 'idk': {'1': 1, '2': 3, '4': 1}, 'pos': {'2': 1}, 'com': {'2': 1}, 'ggg': {'4': 1}, 'obama': {'4': 1}, 'joe': {'4': 1}}

One alternative is to use collections.defaultdict

from collections import defaultdict

tokens = [('abstract', '1'), ('text', '1'), ('oie', '1'), ('idk', '1'), ('idk', '2'), ('pos', '2'), ('idk', '2'), ('idk', '2'), ('com', '2'), ('ggg', '4'), ('obama', '4'), ('joe', '4'), ('idk', '4')]


d = defaultdict(lambda : defaultdict(int))

for o, i in tokens:
    d[o][i]  = 1

res = { k : dict(v) for k, v in d.items()}
print(res)

Output

{'abstract': {'1': 1}, 'text': {'1': 1}, 'oie': {'1': 1}, 'idk': {'1': 1, '2': 3, '4': 1}, 'pos': {'2': 1}, 'com': {'2': 1}, 'ggg': {'4': 1}, 'obama': {'4': 1}, 'joe': {'4': 1}}

A third alternative is to use collections.Counter:

from collections import Counter

tokens = [('abstract', '1'), ('text', '1'), ('oie', '1'), ('idk', '1'), ('idk', '2'), ('pos', '2'),
          ('idk', '2'), ('idk', '2'), ('com', '2'), ('ggg', '4'), ('obama', '4'), ('joe', '4'), ('idk', '4')]

d = {}
for (o, i), value in Counter(tokens).items():
    if o not in d:
        d[o] = {}
    d[o].update({i : value})

print(d)
  • Related