I would like to split the string and eliminate the whitespaces such as
double a[3] = {0.0, 0.0, 0.0};
The expected output is
['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']
How could I do that with re module in Python?
CodePudding user response:
You can make use of the fact that re.split()
retains delimiters in capture groups in the output:
import re
input_string = "double a[3] = {0.0, 0.0, 0.0};"
bits = [bit for bit in (bit.strip() for bit in re.split(r'((?:\d \.\d )|[,}=;]|\w )', input_string)) if bit]
expected = ['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']
assert bits == expected
CodePudding user response:
One approach here might be to use re.findall
:
inp = "double a[3] = {0.0, 0.0, 0.0};"
parts = re.findall(r'\d (?:\.\d )?|\w |[^\s\w]', inp)
print(parts)
# ['double', 'a', '[', '3', ']', '=', '{', '0.0', ',', '0.0', ',', '0.0', '}', ';']
The regex pattern used here says to match:
\d (?:\.\d )?
an integer or float|
OR\w
a word (such as "double")|
OR[^\s\w]
a single non word non whitespace (such as{
)