I am trying to remove ?,() and " from the below string my initial matches are working but for these 3 it not matching kindly suggest what is the issue here
data = "(The rain - :in ,Spain?)"
regexpat = re.sub(r"'|,|;|.|:|/?|-|(|)", "", name)
I need all the special chars removed or replaced with blank
I tried with
-|:|\)|\?|
but substitution is breaking with this output *
*"*(*T*h*e* *r*a*i*n* ** **i*n* * *,*S*p*a*i*n***"*
*
Expected Output is
The rain in Spain
trying here - https://regex101.com/r/Ozsnzv/1
CodePudding user response:
import re
data = "(The rain - :in ,Spain?)"
regexpat = re.sub(r"[',;.:/?()-]", "", data)
regexpat = re.sub(r"\s ", " ", regexpat)
print(regexpat) # The rain in Spain
CodePudding user response:
You can use a character class to shorten the pattern, and escape the dot to match it literally.
Using /?
in an alternation matches an optional /
, but you can just add the /
to the character class. If you also meant to match ?
you can also add that one.
Then change possible double spaces gaps to single spaces.
Note to use data
instead of name
in the call to re.sub.
import re
data = "(The rain - :in ,Spain?)"
regexpat = re.sub(r"[',;.:/?()-] ", "", data)
print(' '.join(regexpat.split()))
Output
The rain in Spain
See a Python demo.
CodePudding user response:
A long shot, but you seem to want to replace all punctuation. One way is to use the 'punctuation' attribute from the built-in string library:
import string
data = "(The rain - :in ,Spain?)"
regexpat = data.translate(str.maketrans('', '', string.punctuation))
print(' '.join(regexpat.split()))
Prints:
The rain in Spain
Note: The punctuation attribute includes: !"#$%&'()* ,-./:;<=>?@[]^_`{|}~ and therefor may be too extensive compared to your current character class. If however this is what you are after it seems to be faster than regex.