I am trying to use re.search (or re.findall) to interpret a line, and change the keyword by a value.
My example string is:
line = 'Text1 <<ALTER, variable = Ion1>> Text2 <<ALTER, variable = Value1>>\n'
With values of Ion1
of 'Na' and Value1
of 1.0, I would like to have the return of
processedline = 'Text1 Na Text2 1.0'
To do so, I tryed the following code:
result = re.search('<<ALTER(.*)>>', line)
aux_txt = result.group(1).split('=')
var = aux_txt[-1].strip()
value = ParameterDictionary[var]
processedline = re.sub('<<ALTER(.*)>>', str(value), line, flags=re.DOTALL)
However, the return I am getting, for the variable result
, is ', variable = Ion1>> Text2 <<ALTER, variable = Value1'
, i.e., it does not treat independently both keywords.
Anyone has some idea? Thanks in advance!
CodePudding user response:
That is because your regex is matching the entire string (till last >>
) instead of matching till the first occurrence of >>
after Ion1
. You need to use lazy
operator with your .*
to limit the match.
What .*?
does is this: It matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Here is an example with an explanation: https://regex101.com/r/oKyOIn/1
CodePudding user response:
Using .*
is too broad, and capture everything between <<ALTER
and >>
. Why not use a more specific regexp ?
>>> re.findall(r"<<ALTER, variable = (\w )>>", line)
['Ion1', 'Value1']
CodePudding user response:
Python re.search for multiple values in the same line
re.search
is wrong tool for this task, it does return first (leftmost) match or None
if not match was found. You should use either re.finditer
which gives iterator of Match objects or re.findall
which gives list
of str
s or tuple
s.
Also as already noted you need to change your pattern <<ALTER(.*)>>
as it does match too much, you might use non-greedy version i.e.
<<ALTER(.*?)>>
or if >
is not allowed inside <<
and >>
harness that as follows
<<ALTER([^>]*)>>
CodePudding user response:
Thanks a lot! It worked perfectly like this:
import re
ParameterDictionary = {'Ion1': 'Na', 'Value1': '1.0'}
line = 'Text1 <<ALTER, variable = Ion1>> Text2 <<ALTER, variable = Value1>>\n'
result = re.findall(r'<<ALTER, variable = (\w )>>', line)
for txt in result:
aux_txt = f'<<ALTER, variable = {txt}>>'
value = ParameterDictionary[txt]
line = re.sub(aux_txt, str(value), line, flags=re.DOTALL)
CodePudding user response:
You need to capture one or more word characters (alphanumeric including underscores) inside <<ALTER, variable =
and >>
, and then use a callable in the re.sub
method replacement argument:
See the Python demo:
import re
ParameterDictionary = {'Ion1': 'Na', 'Value1': '1.0'}
line = 'Text1 <<ALTER, variable = Ion1>> Text2 <<ALTER, variable = Value1>>\n'
rx = r'<<ALTER, variable = (\w )>>'
result = re.sub(rx, lambda x: ParameterDictionary.get(x.group(1), x.group()), line)
print(result)
# => Text1 Na Text2 1.0
Here,
<<ALTER, variable = (\w )>>
matches<<ALTER, variable =
, space, then(\w )
captures into Group 1 any one or more word chars and then>>
is matched- The match is passed into
re.sub
within a lambda expression, asx
, and theParameterDictionary.get(x.group(1), x.group())
either returns the corresponding value by found key, or the whole match (x.group()
).
CodePudding user response:
Using groups to capture within re.sub
seems to be what you are looking for. re.sub
accepts a function as the repl
(replacement string argument). The function is evaluated with the match object as argument. See docs.
>>> param_dict = {'Ion1': 'Na', 'Variable1': '1.0'}
>>> re.sub(r'<<ALTER, variable = ([\w\d] )>>', lambda m: param_dict[m.group(1)], line)
'Text1 Na Text2 1.0\n'
The regex group ([\w\d] )
can be adapted to the kind of values you expect to find.
Using raw strings (starting with r') for regexes in python is good practice and can save you from headaches.