Home > Net >  Regex: How to match the closest whitespace before and after specific character, and replace with not
Regex: How to match the closest whitespace before and after specific character, and replace with not

Time:10-08

I have a list of strings that appear as the following example. Essentially I am trying to strip all whitespace that come before and after the vertical bar. This is in Python.

I am trying to go from this:

string = 'DOE|JOHN|123 ANY STREET |NEW YORK CITY | NY|10001 | 1970/1/1'

To this:

goal = 'DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1'

Please bear with me as I have absolutely no experience with regular expression. I have checked the following solutions, and attempted to repurpose the code for my case, but to no avail.

Remove whitespace before a specific character in python?

Remove White space before and after a special character and join them python

CodePudding user response:

Explanation

It can be done simply, with regex, using \W to determine non alphanumeric including spaces and removing spaces before & after that using \s*.

Try this:

import re

string = 'DOE|JOHN|123 ANY STREET        |NEW YORK CITY  |      NY|10001 | 1970/1/1'

final = re.sub("\s*(\W)\s*", r'\1', string)

print(final)

Output:

DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1

CodePudding user response:

Regular expressions are perfect for just this type of situation. If you're looking to match only the pipe symbol, this will do what you need:

import re

string = 'DOE|JOHN|123 ANY STREET        |NEW YORK CITY  |      NY|10001 | 1970/1/1'

result = re.sub(r'\s*(\|)\s*', r'\1', string)

# result now contains 'DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1'

If you are going to be running the same regex substitution many times, you may want to compile the regex first:

import re

string = 'DOE|JOHN|123 ANY STREET        |NEW YORK CITY  |      NY|10001 | 1970/1/1'
replacement = re.compile(r'\s*(\|)\s*')

result = replacement.sub(r'\1', string)

# result now contains 'DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1'

CodePudding user response:

This can easily be done with native python and does not require regex. Split the input string with split() on pipes ("|"). Then remove terminal white space with strip() and put it all back together with join().

string = 'DOE|JOHN|123 ANY STREET        |NEW YORK CITY  |      NY|10001 | 1970/1/1'

print("|".join([x.strip() for x in string.split("|")]))

Output

'DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1'
  • Related