I have a list of strings that appear as the following example. Essentially I am trying to strip all whitespace that come before and after the vertical bar. This is in Python.
I am trying to go from this:
string = 'DOE|JOHN|123 ANY STREET |NEW YORK CITY | NY|10001 | 1970/1/1'
To this:
goal = 'DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1'
Please bear with me as I have absolutely no experience with regular expression. I have checked the following solutions, and attempted to repurpose the code for my case, but to no avail.
Remove whitespace before a specific character in python?
Remove White space before and after a special character and join them python
CodePudding user response:
Explanation
It can be done simply, with regex, using \W
to determine non alphanumeric including spaces and removing spaces before & after that using \s*
.
Try this:
import re
string = 'DOE|JOHN|123 ANY STREET |NEW YORK CITY | NY|10001 | 1970/1/1'
final = re.sub("\s*(\W)\s*", r'\1', string)
print(final)
Output:
DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1
CodePudding user response:
Regular expressions are perfect for just this type of situation. If you're looking to match only the pipe symbol, this will do what you need:
import re
string = 'DOE|JOHN|123 ANY STREET |NEW YORK CITY | NY|10001 | 1970/1/1'
result = re.sub(r'\s*(\|)\s*', r'\1', string)
# result now contains 'DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1'
If you are going to be running the same regex substitution many times, you may want to compile the regex first:
import re
string = 'DOE|JOHN|123 ANY STREET |NEW YORK CITY | NY|10001 | 1970/1/1'
replacement = re.compile(r'\s*(\|)\s*')
result = replacement.sub(r'\1', string)
# result now contains 'DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1'
CodePudding user response:
This can easily be done with native python and does not require regex. Split the input string with split()
on pipes ("|"). Then remove terminal white space with strip()
and put it all back together with join()
.
string = 'DOE|JOHN|123 ANY STREET |NEW YORK CITY | NY|10001 | 1970/1/1'
print("|".join([x.strip() for x in string.split("|")]))
Output
'DOE|JOHN|123 ANY STREET|NEW YORK CITY|NY|10001|1970/1/1'