Home > OS >  How to parse "{Key=Value}" structs into JSON with Python?
How to parse "{Key=Value}" structs into JSON with Python?

Time:04-15

If I run an Athena query in AWS, the data I get back has structs with key/value pairs that look like this:

{
    "events": "[{deviceType=Android,logins=400},{deviceType=iPhone,logins=550}]"
}

I can use regular expressions to parse this, but things like special characters make that de-serialization very error-prone.

For example, {deviceType=Android, date=2022-01-01} will run into issues with delimiters if I use regex.

Is there an existing de-serializer for this type of thing?

EDIT:

This is the de-serialize regex I have:

def deserialize(s):
    # Surround any word with "
    s1 = re.sub('(\w )', '"\g<1>"', s)

    # Replace = with :
    s2 = re.sub('=', ':', s1)

    return json.loads(s2)

This hits issues when there are special characters in the value like "-" or "." Regex isn't able to properly determine the "word", so doesn't place the enclosing quotes properly.

CodePudding user response:

Given the data as shown, you can isolate the strings between curly brackets with RE then further split those strings into their component parts. Here's an example:

import re

d = {'events': "[{deviceType=Android,logins=400},{deviceType=iPhone,logins=550}]"}

for t in re.findall('(?<={). ?(?=})', d['events']):
    for p in t.split(','):
        print(p)

Output:

deviceType=Android
logins=400
deviceType=iPhone
logins=550

CodePudding user response:

The data inside the quotes is almost JSON but it's missing the quotes around keys and values. With a few judiciously chained .replace() method calls, you should be able to convert it from almost-JSON to JSON and then deserialize it using the json module:

import json
obj = {"events": "[{deviceType=Android, date=2022-01-01}]"}
events = obj['events']
events_json = events.replace(', ', ',').replace('{', '{"').replace('}', '"}').replace('=', '":"').replace(',', '","').replace('}","{','},{')
parsed = json.loads(events_json)
print(parsed[0])

print(parsed[0]['deviceType']) # prints 'Android'
print(parsed[0]['date']) # prints '2022-01-01'

*Edit to fix an issue raised by MisterMiyagi.

  • Related