Home > Blockchain >  python, split, regex and combine re
python, split, regex and combine re

Time:11-13

I have a data in key value format.

key=1234 key1="value in text"

I want to create single regex to split the value of individual key.

for Example:

  • key={regex} must return 1234
  • key1={regex} must return "value in text"
regex="key=\"(.*?)\"|key=([^ ]*)"

I have tried this regex but it is not working. Could you please help me?

I want to split the string in such a way to get the result in tabular format with the help regex and spark.

key | key1 | Value |Value in text|

CodePudding user response:

You can use PyPi regex library and a code like

import regex
text = 'key=1234 key1="value in text"'
# key = 'key1' # => value in text
key = 'key' # => 1234
pattern = fr'\b{regex.escape(key)}=(?|"([^"]*)"|(\S*))'
match = regex.search(pattern, text)
if match:
    print(match.group(1)) # => 1234

See the online Python demo. Details:

  • \b - a word boundary
  • {regex.escape(key)} - the key passed to the regex
  • = - an equal sign
  • (?|"([^"]*)"|(\S*)) - a branch reset group matching
    • "([^"]*)" - a " char, then zero or more chars other than " captured into Group 1 and then a " char
    • | - or
    • (\S*) - Group 1 (again, as it is a branch reset group): zero or more non-whitespace chars.

Here is my "Branch reset groups - capture different patterns into same groups" YT video showcasing the use of branch reset groups.

CodePudding user response:

If the contexts of the string are valid, ie everything after the key is just encapsulated within the quotation marks, then I would prefer to parse the string to a dictionary and get the values you want:

import re

string = 'key=1234 key1="value in text"'
replace =  lambda x: (', ' if x.group(1) else '')   f'"{x.group(2)}":'

my_dict = eval(re.sub(r'(\s)?(\w )=',replace, f"{{{string}}}"))

my_dict['key']
# out[23] 1234

my_dict['key1']
# out[24] 'value in text'
  • Related