Home > other >  libpostal output string (dict) with duplicate keys and I need to convert string to Dict
libpostal output string (dict) with duplicate keys and I need to convert string to Dict

Time:12-13

I am using libpostal Address parsing library as .exe file. I have a script to read the output from terminal . The output will be a string with dict format like below,

enter image description here

This is the address string

"531A UPPER CROSS STREETSINGAPORE HONG LIM COMPLEX 051531 S"

The libpostal terminal output is

'{\n  "house_number": "531a",\n  "road": "upper cross streetsingapore",\n  "city": "hong",\n  "house": "lim complex",\n  "house_number": "051531 s"\n}'

I need to create a Dict from this string and if there is a duplicate key, then append the values together in the same key.

Expected output Dict

{
  "house_number": "531a 051531 s",
  "road": "upper cross streetsingapore",
  "city": "hong",
  "house": "lim complex",
}

helps will be appreciated

CodePudding user response:

You can use json.JSONDecoder to decode the dict literal to a list of tuples, use dict.setdefault to combine values to lists and finally join all items in dicts values:

string = '{\n  "house_number": "531a",\n  "road": "upper cross streetsingapore",\n  "city": "hong",\n  "house": "lim complex",\n  "house_number": "051531 s"\n}'

from json import JSONDecoder
decoder = JSONDecoder(object_pairs_hook=lambda x: x).decode(string)
out = {}
for tpl in decoder:
    out.setdefault(tpl[0],[]).append(tpl[1])
    
out = {k:' '.join(v) for k,v in out.items()}

Output:

{'house_number': '531a 051531 s',
 'road': 'upper cross streetsingapore',
 'city': 'hong',
 'house': 'lim complex'}
  • Related