Home > Software design >  Repeating regex in Python
Repeating regex in Python

Time:11-15

I need to parse the line similar to the:

'''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''

The line is much longer, but the pattern is the same.

Basically, I need a list (or dict) with key, value. Something like:

["'identifier', ''d6e461c5-fd55-42cb-b3e8-40072670fd0f''", "'name', ''some_name2''", "'identifier', ''d6e461c5-fd55-42cb-b3e8-40072670fd0f''", "'name', ''some_name3''", "'value', 'value_without_quotes'"]

I ended up with the following regular expression:

r'Object{( ?)=( ?)}'

It works only if I need the only one object. I'm expecting something like

(( ?)=( ?),)  

to be worked, but it's not. For example,

re.match(r'Object{((. ?)=(. ?),?) }', line3).groups()

Gives me:

("some_name3', value=value_without_quotes", "some_name3', value", 'value_without_quotes')

As you can see 'value=value_without_quotes' appeared. r'Object{(([^=] ?)=(. ?),?) }' doesn't work also.

So the question is how to repeat the above in sequence? The thing is that I don't if the value contains quotes, symbols or digits.

Thank you

CodePudding user response:

You may face this problem in an easier way.

sentence = '''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''
listing = [couple.split("=") for couple in sentence.split(",")]

Flat the list

listing = [y for x in listing for y in x]

And you will obtain something like:

['Object{identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", ' name', "'some_name2'", ' identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", ' name', "'some_name3'", ' value', 'value_without_quotes}']

The you have just to strip() and remove "Object{" and "}"

result = [x.strip().replace("Object{", "").replace("}","") for x in listing]

Final result is:

['identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name2'", 'identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name3'", 'value', 'value_without_quotes']

CodePudding user response:

line3 = '''Object{identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name2', identifier='d6e461c5-fd55-42cb-b3e8-40072670fd0f', name='some_name3', value=value_without_quotes}'''

pattern = r'[{\s](. ?)=(. ?)[}\s,]'
match = re.findall(pattern, line3)
[item for key_value_pair in match for item in key_value_pair]

Outputs

['identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name2'", 'identifier', "'d6e461c5-fd55-42cb-b3e8-40072670fd0f'", 'name', "'some_name3'", 'value', 'value_without_quotes']

  • Related