python, split, regex and combine re-CodePudding

I have a data in key value format.

key=1234 key1="value in text"

I want to create single regex to split the value of individual key.

for Example:

key={regex} must return 1234
key1={regex} must return "value in text"

regex="key=\"(.*?)\"|key=([^ ]*)"

I have tried this regex but it is not working. Could you please help me?

I want to split the string in such a way to get the result in tabular format with the help regex and spark.

key | key1 | Value |Value in text|

CodePudding user response：

You can use PyPi regex library and a code like

import regex
text = 'key=1234 key1="value in text"'
# key = 'key1' # => value in text
key = 'key' # => 1234
pattern = fr'\b{regex.escape(key)}=(?|"([^"]*)"|(\S*))'
match = regex.search(pattern, text)
if match:
    print(match.group(1)) # => 1234

See the online Python demo. Details:

\b - a word boundary
{regex.escape(key)} - the key passed to the regex
= - an equal sign
(?|"([^"]*)"|(\S*)) - a branch reset group matching
- "([^"]*)" - a " char, then zero or more chars other than " captured into Group 1 and then a " char
- | - or
- (\S*) - Group 1 (again, as it is a branch reset group): zero or more non-whitespace chars.

Here is my "Branch reset groups - capture different patterns into same groups" YT video showcasing the use of branch reset groups.

CodePudding user response：

If the contexts of the string are valid, ie everything after the key is just encapsulated within the quotation marks, then I would prefer to parse the string to a dictionary and get the values you want:

import re

string = 'key=1234 key1="value in text"'
replace =  lambda x: (', ' if x.group(1) else '')   f'"{x.group(2)}":'

my_dict = eval(re.sub(r'(\s)?(\w )=',replace, f"{{{string}}}"))

my_dict['key']
# out[23] 1234

my_dict['key1']
# out[24] 'value in text'