I have a data in key value format.
key=1234 key1="value in text"
I want to create single regex to split the value of individual key.
for Example:
- key={regex} must return 1234
- key1={regex} must return "value in text"
regex="key=\"(.*?)\"|key=([^ ]*)"
I have tried this regex but it is not working. Could you please help me?
I want to split the string in such a way to get the result in tabular format with the help regex and spark.
key | key1 | Value |Value in text|
CodePudding user response:
You can use PyPi regex library and a code like
import regex
text = 'key=1234 key1="value in text"'
# key = 'key1' # => value in text
key = 'key' # => 1234
pattern = fr'\b{regex.escape(key)}=(?|"([^"]*)"|(\S*))'
match = regex.search(pattern, text)
if match:
print(match.group(1)) # => 1234
See the online Python demo. Details:
\b
- a word boundary{regex.escape(key)}
- the key passed to the regex=
- an equal sign(?|"([^"]*)"|(\S*))
- a branch reset group matching"([^"]*)"
- a"
char, then zero or more chars other than"
captured into Group 1 and then a"
char|
- or(\S*)
- Group 1 (again, as it is a branch reset group): zero or more non-whitespace chars.
Here is my "Branch reset groups - capture different patterns into same groups" YT video showcasing the use of branch reset groups.
CodePudding user response:
If the contexts of the string are valid, ie everything after the key is just encapsulated within the quotation marks, then I would prefer to parse the string to a dictionary and get the values you want:
import re
string = 'key=1234 key1="value in text"'
replace = lambda x: (', ' if x.group(1) else '') f'"{x.group(2)}":'
my_dict = eval(re.sub(r'(\s)?(\w )=',replace, f"{{{string}}}"))
my_dict['key']
# out[23] 1234
my_dict['key1']
# out[24] 'value in text'