Let's say I have a dictionary as a string with an unknown number of spaces in between:
'address: 123 fake street city: new york state: new york population: 500000'
How do I get
{'address': '123 fake street',
'city': 'new york',
'state': 'new york',
'population': 500000}
OR even lists or tuples to the effect of:
['address', '123 fake street'],
['city', 'new york'...]
(1) Assume keys are always single words with a colon
key:
city:
population:
(2) Assume edge cases where Address:
may be "C/O John Smith @ Building X, S/W"
CodePudding user response:
Try (regex101):
import re
s = "address: C/O John Smith @ Building X, S/W city: new york state: new york population: 500000"
d = dict(re.findall(r"([^\s] )\s*:\s*(.*?)\s*(?=[^\s] :|$)", s))
print(d)
Prints:
{
"address": "C/O John Smith @ Building X, S/W",
"city": "new york",
"state": "new york",
"population": "500000",
}
CodePudding user response:
you can try this regex: (\w ): *([\w\\\/\- \.\@\_\| ] )([^\w:]|$)
but you also have to strip it
import re
my_string = 'address: 123 fake street city: new york state: new york population: 500000'
{ x.group(1): x.group(2).strip() for x in re.finditer(r'(\b\w \b): *([\w\\\/\- \.\_\|\@ ] )([^\w:]|$)', my_string)}
Result:
{'address': '123 fake street',
'city': 'new york',
'state': 'new york',
'population': '500000'}