I have this:
words = ["Alice's", 'Adventures', 'in', 'Wonderland', "ALICE'S", 'ADVENTURES', 'IN', 'WONDERLAND', 'Lewis', 'Carroll', 'THE', 'MILLENNIUM', 'FULCRUM', 'EDITION', '3.0', 'CHAPTER', 'I', 'Down', 'the', 'Rabbit-Hole', 'Alice', 'was']
remove_strings = str.maketrans(' ', '!*01.23456,789-\,?\'\.(:;)\"!')
words = [s.translate(remove_strings) for s in words]
words = [words.lower() for words in words]
I want to get rid of all the punctuations and numbers.
But it just converts to lower case and does not remove the punctuations as I thought it would.
What am I doing wrong?
CodePudding user response:
str.maketrans
maps characters specified in the first argument to the second argument, so you're really just mapping a space to a different character with your current code. A quick fix therefore is to simply swap the two arguments:
remove_strings = str.maketrans('!*01.23456,789-\,?\'\.(:;)\"!', ' ')
An easier approach would be to use a regex substitution to replace all non-alphabets with a space:
import re
words = [re.sub('[^a-z]', ' ', word, flags=re.I).lower() for word in words]