I have a situation where user can enter commands with optional key value pairs and value may contain spaces ..
here are 4 - different form user input where key and value are separated with = sign and values have space:
"cmd=create-folder name=SelfServe - Test ride"
"cmd=create-folder name=SelfServe - Test ride server=prd"
"cmd=create-folder name=cert - Test ride server=dev site=Service"
"cmd=create-folder name=cert - Test ride server=dev site=Service permission=locked"
Requirement: I am trying to parse this string and split into a dictionary based on the key and value present on a string .
If user enter First form of Statement, that wold produce a dictionary like :
query_dict = {
'cmd' : 'create-folder',
'name' : 'selfserve - Test ride'
}
if user enter second form of statement that would produce /add the additional key /value pair
query_dict = {
'cmd' : 'create-folder',
'name' : 'selfserve - Test ride',
'server' : 'prd'
}
if user enter third form of statement that would produce
query_dict ={
'cmd' : 'create-folder',
'name' : 'cert - Test ride',
'server' : 'dev',
'site': 'Service'
}
forth form produce the dictionary with key/value split like below
query_dict ={
'cmd' : 'create-folder',
'name' : 'cert - Test ride',
'server' : 'dev',
'site': 'Service',
'permission' : 'locked' }
-idea is to parse a string where key and value are separated with = symbol and where the values can have one or more space and extract the matching key /value pair .
I tried multiple methods to match but unable to figure out a single generic regular expression pattern which can match/extract any string where we have this kind of pattern
Appreciate your help.
i tried several pattern map based different possible user input but that is not a scalable approach . example :
i created three pattern to match three variety of user input but it would be nice if i can have one generic pattern that can match any combination of key=values in a string (i am hard coding the key in the pattern which is not ideal
'(cmd=create-folder).*(name=.*).*' ,
'(cmd=create-pfolder).*(name=.*).*(server=.*).*',
'(cmd=create-pfolder).*(name=.*).*(server=.*).*(site=.*)'
CodePudding user response:
I would suggest using split
, and then zip
to feed the dict
constructor:
def get_dict(s):
parts = re.split(r"\s*(\w )=", s)
return dict(zip(parts[1::2], parts[2::2]))
Example runs:
print(get_dict("cmd=create-folder name=SelfServe - Test ride"))
print(get_dict("cmd=create-folder name=SelfServe - Test ride server=prd"))
print(get_dict("cmd=create-folder name=cert - Test ride server=dev site=Service"))
print(get_dict("cmd=create-folder name=cert - Test ride server=dev site=Service permission=locked"))
Outputs:
{'cmd': 'create-folder', 'name': 'SelfServe - Test ride'}
{'cmd': 'create-folder', 'name': 'SelfServe - Test ride', 'server': 'prd'}
{'cmd': 'create-folder', 'name': 'cert - Test ride', 'server': 'dev', 'site': 'Service'}
{'cmd': 'create-folder', 'name': 'cert - Test ride', 'server': 'dev', 'site': 'Service', 'permission': 'locked'}
Explanation
Using this input as example:
"cmd=create-folder name=SelfServe - Test ride"
The split
regex identifies these parts:
"cmd=create-folder name=SelfServe - Test ride"
^^^^ ^^^^^^^^^
The strings that are not matched by it will end up a results, so we have these:
"", "create-folder", "SelfServe - Test ride"
The first string is empty, because it is what precedes the first match.
Now, as the regex has a capture group, the string that is captured by that group, is also returned in the result list, at odd indices. So parts
ends up like this:
["", "cmd", "create-folder", "name", "SelfServe - Test ride"]
The keys we are interested in, occur at odd indices. We can get those with parts[1::2]
, where 1
is the starting index, and 2
is the step.
The corresponding values for those keys occur at even indices, ignoring the empty string at index 0. So we get those with parts[2::2]
. With the call to zip
, we pair those keys and values together as we want them.
Finally, the dict
constructor can take an argument with key/value pairs, which is exactly what that zip
call provides.
CodePudding user response:
Try with the following regex:
(\S )=([^=] ?)(?=\s\S =|$)
Regex Explanation:
(\S )
: first group holds any non-space character=
: followed by a equal sign([^=] ?)
: second group holds any non-equal character (least possible)(?=\s\S =|$)
: followed by either a space word=
, or end of string character
Check the regex demo here.
Note: Here the assumption is that your key (right-hand side of the pair) won't allow spaces.
You can then use this python code to retrieve your groups:
import re
strings = [
"cmd=create-folder name=SelfServe - Test ride",
"cmd=create-folder name=SelfServe - Test ride server=prd",
"cmd=create-folder name=cert - Test ride server=dev site=Service",
"cmd=create-folder name=cert - Test ride server=dev site=Service permission=locked"
]
pattern = r'(\S )=([^=] ?)(?=\s\S =|$)'
for string in strings:
print(string)
for match in re.findall(pattern, string):
print(f'Group1: {match[0]} \t Group2: {match[1]}')
print()
Check the python demo here.