Using Regex to remove everything except words, digits and spaces.
This is the function I defined:
def remove(text):
return re.sub(r'[^\w\d\s]', '', text)
Is there anything extra or something missed out
CodePudding user response:
Your approach will work. For example:
import re
text = ' !"(/£hello world1!!!!%"& '
def remove(text):
return re.sub(r'[^\w\d\s]', '', text)
print (remove(text))
Your output will be:
>>> hello world1
See this example here.
CodePudding user response:
\w
actually catches all the alphabets ([A-Za-z]
), numbers (\d
), and underscores _
So, better try this code (with a different Regex)
def remove(text):
return re.sub(r'[^A-Za-z\d\s] ', '', text)
Tell me if its not working...