I can't figure out a solution to this problem.
I have this example string:
test4 = "versandkostenfrei=Ja,delivery_time=sofort lieferbar,instantly_deliverable=true,spannung=7,2 Volt"
I'd like to convert this into a dict, where everything before the =
sign is a key, and everything after is the value, up until the comma. The major problem is that some of the values (after the equal sign) contain a comma themselves. Also looking at the entire data set, it's possible that this last bit here spannung=7,2 Volt
is somewhere in the middle of the string.
Desired output:
{
"versandkostenfrei": "Ja",
"delivery_time": "sofort lieferbar",
"instantly_deliverable": "true",
"spannung": "7,2 Volt"
}
It's not important if the bool value is also surrounded by double quotes or not.
CodePudding user response:
Split the string with regex by keys with "=" (and possible prepended ","), with the key as a captured group.
This will create a list of alternating keys and values (first item will be an empty string).
Then you just collect them into key/value tuples to create a dict.
import re
test4 = "versandkostenfrei=Ja,delivery_time=sofort lieferbar,instantly_deliverable=true,spannung=7,2 Volt"
parts = re.split(",?([\w_] )=", test4)
output = dict((parts[i], parts[i 1]) for i in range(1, len(parts), 2))
print(output)
This creates:
{
'versandkostenfrei': 'Ja',
'delivery_time': 'sofort lieferbar',
'instantly_deliverable': 'true',
'spannung': '7,2 Volt'
}
CodePudding user response:
dict(map(lambda x: x.split("="), [x.group() for x in re.finditer(r'[a-z_] \=([A-Za-z ] |([0-9] ,[0-9] )) ', test, re.DOTALL)]))
output:
{'versandkostenfrei': 'Ja',
'delivery_time': 'sofort lieferbar',
'instantly_deliverable': 'true',
'spannung': '7,2 Volt'}