Home > Back-end >  Split string to dict, where delimiter is part of value
Split string to dict, where delimiter is part of value

Time:07-19

I can't figure out a solution to this problem.

I have this example string:

test4 = "versandkostenfrei=Ja,delivery_time=sofort lieferbar,instantly_deliverable=true,spannung=7,2 Volt"

I'd like to convert this into a dict, where everything before the = sign is a key, and everything after is the value, up until the comma. The major problem is that some of the values (after the equal sign) contain a comma themselves. Also looking at the entire data set, it's possible that this last bit here spannung=7,2 Volt is somewhere in the middle of the string.

Desired output:

{
  "versandkostenfrei": "Ja",
  "delivery_time": "sofort lieferbar",
  "instantly_deliverable": "true",
  "spannung": "7,2 Volt"
}

It's not important if the bool value is also surrounded by double quotes or not.

CodePudding user response:

Split the string with regex by keys with "=" (and possible prepended ","), with the key as a captured group.
This will create a list of alternating keys and values (first item will be an empty string).
Then you just collect them into key/value tuples to create a dict.

import re
test4 = "versandkostenfrei=Ja,delivery_time=sofort lieferbar,instantly_deliverable=true,spannung=7,2 Volt"
parts = re.split(",?([\w_] )=", test4)
output = dict((parts[i], parts[i 1]) for i in range(1, len(parts), 2))
print(output)

This creates:

{
    'versandkostenfrei': 'Ja',
    'delivery_time': 'sofort lieferbar',
    'instantly_deliverable': 'true',
    'spannung': '7,2 Volt'
}

CodePudding user response:

dict(map(lambda x: x.split("="), [x.group() for x in re.finditer(r'[a-z_] \=([A-Za-z ] |([0-9] ,[0-9] )) ', test, re.DOTALL)]))

output:

{'versandkostenfrei': 'Ja',
 'delivery_time': 'sofort lieferbar',
 'instantly_deliverable': 'true',
 'spannung': '7,2 Volt'}
  • Related