I need to parse the line similar to the:
'''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''
The line is much longer, but the pattern is the same.
Basically, I need a list (or dict) with key, value. Something like:
["'identifier', ''d6e461c5-fd55-42cb-b3e8-40072670fd0f''", "'name', ''some_name2''", "'identifier', ''d6e461c5-fd55-42cb-b3e8-40072670fd0f''", "'name', ''some_name3''", "'value', 'value_without_quotes'"]
I ended up with the following regular expression:
r'Object{( ?)=( ?)}'
It works only if I need the only one object. I'm expecting something like
(( ?)=( ?),)
to be worked, but it's not. For example,
re.match(r'Object{((. ?)=(. ?),?) }', line3).groups()
Gives me:
("some_name3', value=value_without_quotes", "some_name3', value", 'value_without_quotes')
As you can see 'value=value_without_quotes' appeared. r'Object{(([^=] ?)=(. ?),?) }' doesn't work also.
So the question is how to repeat the above in sequence? The thing is that I don't if the value contains quotes, symbols or digits.
Thank you
CodePudding user response:
You may face this problem in an easier way.
sentence = '''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''
listing = [couple.split("=") for couple in sentence.split(",")]
Flat the list
listing = [y for x in listing for y in x]
And you will obtain something like:
['Object{identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", ' name', "'some_name2'", ' identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", ' name', "'some_name3'", ' value', 'value_without_quotes}']
The you have just to strip()
and remove "Object{" and "}"
result = [x.strip().replace("Object{", "").replace("}","") for x in listing]
Final result is:
['identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name2'", 'identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name3'", 'value', 'value_without_quotes']
CodePudding user response:
line3 = '''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''
pattern = r'[{\s](. ?)=(. ?)[}\s,]'
match = re.findall(pattern, line3)
[item for key_value_pair in match for item in key_value_pair]
Outputs
['identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name2'", 'identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name3'", 'value', 'value_without_quotes']