I have a list of strings. Each element represents a field as key value separated by space:
listA = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
Behavior
I need to return a dict
out of this list with expanding the keys like 'xyz0-1'
by the range denoted by 0-1 into multiple keys like abcd1
and abcd2
with the same value like 4d4e
.
It should run as part of an Ansible plugin, where Python 2.7 is used.
Expected
The end result would look like the dict below:
{
abcd1: 4d4e,
abcd2: 4d4e,
xyz0: 551,
xyz1: 551,
foo: 3ea,
bar1: 2bd,
mc-mqisd0: 77a,
mc-mqisd1: 77a,
mc-mqisd2: 77a,
}
Code
I have created below function. It is working with Python 3.
def listFln(listA):
import re
fL = []
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
comp = re.sub('^(. ?)(\d -\d )?$',r'\1',aL)
cmpCountR = re.sub('^(. ?)(\d -\d )?$',r'\2',aL)
if cmpCountR.strip():
nStart = int(cmpCountR.split('-')[0])
nEnd = int(cmpCountR.split('-')[1])
for j in range(nStart,nEnd 1):
fL.append(comp str(j) ' ' bL)
else:
fL.append(i)
return(dict([k.split() for k in fL]))
Error
In lower python versions like Python 2.7. this code throws an "unmatched group" error:
cmpCountR = re.sub('^(. ?)(\d -\d )?$',r'\2',aL)
File "/usr/lib64/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib64/python2.7/re.py", line 275, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib64/python2.7/sre_parse.py", line 800, in expand_template
raise error, "unmatched group"
Anything wrong with the regex here?
CodePudding user response:
Here's a simpler version using findall
instead of sub
, successfully tested on 2,7. It also directly creates the dict instead of first building a list:
mylist=[
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
def listFln(listA):
import re
fL = {}
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
comp = re.findall('^(. ?)(\d -\d )?$',aL)[0]
if comp[1]:
nStart = int(comp[1].split('-')[0])
nEnd = int(comp[1].split('-')[1])
for j in range(nStart,nEnd 1):
fL[comp[0] str(j)] = bL
else:
fL[comp[0]] = bL
return fL
print(listFln(mylist))
# {'abcd1': '4d4e',
# 'abcd2': '4d4e',
# 'xyz0': '551',
# 'xyz1': '551',
# 'foo': '3ea',
# 'bar1': '2bd',
# 'mc-mqisd0': '77a',
# 'mc-mqisd1': '77a',
# 'mc-mqisd2': '77a'}
CodePudding user response:
Used Python 2.7 to reproduce:
Both patterns compile
import re
# both seem identical
regex1 = '^(. ?)(\d -\d )?$'
regex2 = '^(. ?)(\d -\d )?$'
# also the compiled pattern is identical, see hash
re.compile(regex1) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
re.compile(regex2) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
Note: The compiled pattern using re.compile()
saves time when re-using multiple times like in this loop.
Fix: test for groups found
The error-message indicates that there are groups that aren't matched.
Put it other: In the matching result of re.sub
(docs to 2.7) there are references to groups like the second capturing group (\2
) that have not been found or captured in the given string input:
sre_constants.error: unmatched group
To fix this, we should test on groups that were found in the match.
Therefore we use re.match(regex, str)
or the compiled variant pattern.match(str)
to create a Match
object, then Match.groups()
to return all found groups as tuple.
import re
regex = '^(. ?)(\d -\d )?$'
pattern = re.compile(regex) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
def listFln(listA):
fL = []
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
# test for match and groups found
match = pattern.match(aL)
print("DEBUG groups:", match.groups()) # tuple containing all the subgroups of the match,
# watch: the 3 iteration has only group(1)
# break to next iteration here, if no 2nd group
if not match or not match.group(2):
continue
comp = re.sub(pattern, r'\1', aL)
cmpCountR = re.sub(pattern, r'\2', aL)
if cmpCountR.strip():
parts = cmpCountR.split('-')
nStart = int(parts[0])
nEnd = int(parts[1])
for j in range(nStart,nEnd 1):
fL.append(comp str(j) ' ' bL)
else:
fL.append(i)
return dict([k.split() for k in fL])
listA = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
as_dict = listFln(listA)
print("resulting dict:", as_dict)
Prints:
('DEBUG groups:', ('abcd', '1-2'))
('DEBUG groups:', ('xyz', '0-1'))
('DEBUG groups:', ('foo', None))
('DEBUG groups:', ('bar1', None))
('DEBUG groups:', ('mc-mqisd', '0-2'))
('resulting dict:', {'mc-mqisd2': '77a', 'mc-mqisd0': '77a', 'mc-mqisd1': '77a', 'xyz1': '551', 'xyz0': '551', 'abcd1': '4d4e', 'abcd2': '4d4e'})
CodePudding user response:
You could use a single pattern with 4 capture groups, and check if the 3rd capture group value is not empty.
^(.*?)(\d )(?:-(\d ))?\s*(.*)
The pattern matches:
^
Start of string(.*?)
Capture group 1, match any character, as few as possible(\d )
Capture group 2, match 1 digits(?:-(\d ))?
Optionally match-
and capture 1 digits in group 3\s*
Match optional whitespace chars(.*)
Capture group 4, match the rest of the line
Code example (works on Python 2 and Python 3)
import re
strings = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
def listFln(listA):
dct = {}
for s in listA:
lst = sum(re.findall(r"^(.*?)(\d )(?:-(\d ))?\s*(.*)", s), ())
if lst:
for i in range(int(lst[1]), (int(lst[2]) if lst[2] else int(lst[1])) 1):
dct[lst[0] str(i)] = lst[3]
return dct
print(listFln(strings))
Output
{
'abcd1': '4d4e',
'abcd2': '4d4e',
'xyz0': '551',
'xyz1': '551',
'foo 3': 'ea',
'bar1': '2bd',
'mc-mqisd0': '77a',
'mc-mqisd1': '77a',
'mc-mqisd2': '77a'
}