I have various strings that might have multiple leading spaces.
string_1 = ' param A val A'
string_2 = 'param B val B'
....
I want to replace all multiple spaces with a single space IF the multiple spaces are not in the start of the string.
I want output of above to become
string_1 = ' param A val A'z
string_2 = 'param B val B'
My current solution replaces all multiple spaces with a single space regardless.
re.sub('\s ',' ',s)
How would I construct a pattern that only captures non leading multiple spaces?
CodePudding user response:
You can use \b\s{2,}\b
as your pattern, If those multiple spaces are leading, they are not in a word boundary. Also for multiple spaces use {2,}
instead of
to exclude single space:
import re
string_1 = " param A val A"
string_2 = "param B val B"
pattern = re.compile(r"\b\s{2,}\b")
for test in (string_1, string_2):
print(pattern.sub(" ", test))
output:
param A val A
param B val B
Note: Trailing multiple spaces is not changed this way. To do that you can omit last \b
then if converts to a single space again.
As noted by @JvdV, \b
doesn't take other range of characters into account. For example if you a string like "[ param A val A ]"
, the above pattern won't work for it. Instead you can use a Positive Look Behind assertion ((?<=\S)
) and Positive Look Ahead assertion ((?=\S)
) to match any non-white-space character:
>>> import re
>>> text = "[ param A val A ]"
>>> re.sub(r"\b\s{2,}\b", " ", text)
'[ param A val A ]'
>>> re.sub(r"(?<=\S)\s{2,}(?=\S)", " ", text)
'[ param A val A ]'
CodePudding user response:
Have a go with:
(?<=\S)\s (\s\S|$)
And replace with \1
. See an online demo
(?<=\S)
- Assert position is preceded by a non-whitespace char;\s
- 1 Whitespace char (greedy);(\s\S|$)
- Capture group to match a whitespace char and a non-whitespace char or a end-string anchor.
The above will leave only leading spaces alone. For example:
import regex as re
s = ' Abc 12$ D Efg '
print(re.sub(r'(?<=\S)\s (\s\S|$)', '\1', s))
Prints: Abc 12$ D Efg
CodePudding user response:
split your input into two groups (1st group: leading space, 2nd group: everything else), then eliminate the multispaces from 2nd group, and join groups:
>>> string_1 = ' param A val A'
>>> sb = re.compile(r'^(\s )(.*)')
>>> mg = sb.match(string_1).groups()
>>> mg[0] ' '.join(mg[1].split())
' param A val A'