Home > OS >  List of tuples to nested dictionary based on tuple's values
List of tuples to nested dictionary based on tuple's values

Time:04-28

Given a list of tuples like

[(1, 'Japan', 1, 'Tokyo'), (1, 'Japan', 2, 'Osaka'), (2, 'Korea', 1, 'Seoul',), (2, 'Korea', 2, 'Pyongyang')]
# country_id, country_name, city_id, city_name

I wish to structure it into this:

{
  'countries': [
    {
      'country_id': 1,
      'country_name': 'Japan',
      'cities': [
        {
          'city_id': 1,
          'city_name': 'Tokyo'
        },
        {
          'city_id': 2,
          'city_name': 'Osaka'
        }
      ]
    },
    {
      'country_id': 2,
      'country_name': 'Korea',
      'cities': [
        {
          'city_id': 1,
          'city_name': 'Seoul'
        },
        {
          'city_id': 2,
          'city_name': 'Pyongyang'
        }
      ]
    }
  ]
}

I implemented this and it works well, but is not Pythonic. Wondering if this can be greatly refined/sped up since this forms the response of an API.

x = [(1, 'Japan', 1, 'Tokyo'), (1, 'Japan', 2, 'Osaka'), (2, 'Korea', 1, 'Seoul'), (2, 'Korea', 2, 'Pyongyang')]
countrylist = []
query_countries = []
for a in x:
    if a[0] not in countrylist:
        query_countries.append((a[0], a[1]))
        countrylist.append(a[0])
countrylist = list(set(countrylist))
countries =  [{'country_id': r[0], 'country_name': r[1], 'cities': []} for r in query_countries]
for r in x:
    countries[countrylist.index(r[0])]['cities'].append({'city_id': r[2], 'city_name': r[3]})
final = {'countries': countries}
print(final)
#{'countries': [{'country_id': 1, 'country_name': 'Japan', 'cities': [{'city_id': 1, 'city_name': 'Tokyo'}, {'city_id': 2, 'city_name': 'Osaka'}]}, {'country_id': 2, 'country_name': 'Korea', 'cities': [{'city_id': 1, 'city_name': 'Seoul'}, {'city_id': 2, 'city_name': 'Pyongyang'}]}]}

CodePudding user response:

The expressions a[0] not in countrylist and countrylist.index(r[0]) are not the most efficient, as countrylist is a list and these operations require the list to be scanned. At some point you turn it into a set, but you could have decided to use a set from the start all the way to the end, and then these lookup operations could be done in constant time (on average). This would already improve performance for large inputs.

For this kind of challenge, groupby and itemgetter seem good tools to use. They allow to get the job done with one expression:

data = [(1, 'Japan', 1, 'Tokyo'), (1, 'Japan', 2, 'Osaka'), (2, 'Korea', 1, 'Seoul',), (2, 'Korea', 2, 'Pyongyang')]

from itertools import groupby
from operator import itemgetter

result = [
    {
        "country_id": country_id,
        "country_name": country_name,
        "cities": [
            {
                "city_id": city_id,
                "city_name": city_name
            } for *_, city_id, city_name in cities
        ]
    } for (country_id, country_name), cities in groupby(data, itemgetter(0, 1))
]

print(result)

groupby

You can imagine this groupby call to return this structure:

[
    (1, 'Japan'), [
        (1, 'Japan', 1, 'Tokyo'), 
        (1, 'Japan', 2, 'Osaka')
    ],
    (2, 'Korea'), [
        (2, 'Korea', 1, 'Seoul'),
        (2, 'Korea', 2, 'Pyongyang')
    ]
]

...except that the lists are not lists but iterators. But for a for ... in syntax that makes no difference.

The inner tuples are just references to the original data, while the values in the outer layer (the groups) are produced by itemgetter, which produces tuples of the first two values.

  • Related