I have a string with excess whitespace. I want to remove any whitespace at the start of each line up to the color. I also want to preserve single spaces between words, not affect colons if they don't precede a percentage (look at the Pastels
in the string for an example) and the number of spaces after the colon (1 space for double digits, 2 spaces for single digits). So far I'm preserving everything I want, but I'm not able to get rid of single spaces after the \n
.
How do I remove all whitespace after a new line and at the start of the string in one pattern?
I want the string to look like this: 'Red: 80%\nNavy Blue: 15%\nGreen: 3%\nPastels: Pink, Baby Blue, Lavender: 2%'
my_string = ' Red: 80%\n Navy Blue: 15%\n Green: 3%\n Pastels: Pink, Baby Blue, Lavender: 2%'
my_pattern = re.compile('(?<![:])[ ]{2,}') # match 2 or more spaces unless they follow a colon
# the following:
re.sub(my_pattern, '', my_string)
# returns this:
'Red: 80%\n Navy Blue: 15%\nGreen: 3%\nPastels: Pink, Baby Blue, Lavender: 2%' # Note the number of spaces after the colons and newlines.
# The space before "Navy Blue" is the problem.
# this would give me the desired result, but what pattern would let me do it all within one re.sub() ?
re.sub(my_pattern, '', my_string).replace('\n ', '\n')
# returns this:
'Red: 80%\nNavy Blue: 15%\nGreen: 3%\nPastels: Pink, Baby Blue, Lavender: 2%'
CodePudding user response:
Found a solution. Far simpler than I was originally thinking:
my_pattern = re.compile('(?m)^\s ') # (?m) sets to multiline mode
# ^\s matches any whitespace immediately following the start of a line
# a little cleaner way of writing the same thing:
my_pattern = re.compile('^\s ', re.MULTILINE)
# the following:
re.sub(my_pattern, '', my_string)
# returns:
'Red: 80%\nNavy Blue: 15%\nGreen: 3%\nPastels: Pink, Baby Blue, Lavender: 2%'
CodePudding user response:
In order to remove only horizontal whitespace chars from the start of each line, you can use
my_pattern = re.compile(r'(?m)^[^\S\r\n] ')
my_pattern = re.compile(r'^[^\S\r\n] ', re.M)
my_pattern = re.compile(r'^[^\S\r\n] ', re.MULTILINE)
# and then use my_pattern.sub:
text = my_pattern.sub('', text)
Note the (?m)
inline modifier flag is equivalent to re.M
option, it is handy when you can use a regex in some function/method that is defined in some linked library, and you do not want to import re
module to just be able to use the flag.
Details:
^
- start of a line[^\S\r\n]
- one or more ([^...]
is a negated character class) a CR (carriage return,\r
), LF (line feed,\n
) and non-whitespace char (\S
). So, this is the same as\s
with LF and CR chars subtracted from it.
See the regex demo.