Python re.search for multiple values in the same line-CodePudding

I am trying to use re.search (or re.findall) to interpret a line, and change the keyword by a value.

My example string is:

line = 'Text1 <<ALTER, variable = Ion1>> Text2 <<ALTER, variable = Value1>>\n'

With values of Ion1 of 'Na' and Value1 of 1.0, I would like to have the return of

processedline = 'Text1 Na Text2 1.0'

To do so, I tryed the following code:

result = re.search('<<ALTER(.*)>>', line)
aux_txt = result.group(1).split('=')
var = aux_txt[-1].strip()
value = ParameterDictionary[var]
processedline = re.sub('<<ALTER(.*)>>', str(value), line, flags=re.DOTALL)

However, the return I am getting, for the variable result, is ', variable = Ion1>> Text2 <<ALTER, variable = Value1', i.e., it does not treat independently both keywords.

Anyone has some idea? Thanks in advance!

CodePudding user response：

That is because your regex is matching the entire string (till last >>) instead of matching till the first occurrence of >> after Ion1. You need to use lazy operator with your .* to limit the match.

What .*? does is this: It matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)

Here is an example with an explanation: https://regex101.com/r/oKyOIn/1

CodePudding user response：

Using .* is too broad, and capture everything between <<ALTER and >>. Why not use a more specific regexp ?

>>> re.findall(r"<<ALTER, variable = (\w )>>", line)
['Ion1', 'Value1']

CodePudding user response：

Python re.search for multiple values in the same line

re.search is wrong tool for this task, it does return first (leftmost) match or None if not match was found. You should use either re.finditer which gives iterator of Match objects or re.findall which gives list of strs or tuples.

Also as already noted you need to change your pattern <<ALTER(.*)>> as it does match too much, you might use non-greedy version i.e.

<<ALTER(.*?)>>

or if > is not allowed inside << and >> harness that as follows

<<ALTER([^>]*)>>

CodePudding user response：

Thanks a lot! It worked perfectly like this:

import re

ParameterDictionary = {'Ion1': 'Na', 'Value1': '1.0'}
line = 'Text1 <<ALTER, variable = Ion1>> Text2 <<ALTER, variable = Value1>>\n'
result = re.findall(r'<<ALTER, variable = (\w )>>', line)
for txt in result:
    aux_txt = f'<<ALTER, variable = {txt}>>'
    value = ParameterDictionary[txt]
    line = re.sub(aux_txt, str(value), line, flags=re.DOTALL)

CodePudding user response：

You need to capture one or more word characters (alphanumeric including underscores) inside <<ALTER, variable = and >>, and then use a callable in the re.sub method replacement argument:

See the Python demo:

import re
ParameterDictionary = {'Ion1': 'Na', 'Value1': '1.0'}
line = 'Text1 <<ALTER, variable = Ion1>> Text2 <<ALTER, variable = Value1>>\n'
rx = r'<<ALTER, variable = (\w )>>'
result = re.sub(rx, lambda x: ParameterDictionary.get(x.group(1), x.group()), line)
print(result)
# => Text1 Na Text2 1.0

Here,

<<ALTER, variable = (\w )>> matches <<ALTER, variable =, space, then (\w ) captures into Group 1 any one or more word chars and then >> is matched
The match is passed into re.sub within a lambda expression, as x, and the ParameterDictionary.get(x.group(1), x.group()) either returns the corresponding value by found key, or the whole match (x.group()).

CodePudding user response：

Using groups to capture within re.sub seems to be what you are looking for. re.sub accepts a function as the repl (replacement string argument). The function is evaluated with the match object as argument. See docs.

>>> param_dict = {'Ion1': 'Na', 'Variable1': '1.0'}
>>> re.sub(r'<<ALTER, variable = ([\w\d] )>>', lambda m: param_dict[m.group(1)], line)
'Text1 Na Text2 1.0\n'

The regex group ([\w\d] ) can be adapted to the kind of values you expect to find.

Using raw strings (starting with r') for regexes in python is good practice and can save you from headaches.