Home > Net >  Remove punctuations from a list
Remove punctuations from a list

Time:09-25

I have this:

words = ["Alice's", 'Adventures', 'in', 'Wonderland', "ALICE'S", 'ADVENTURES', 'IN', 'WONDERLAND', 'Lewis', 'Carroll', 'THE', 'MILLENNIUM', 'FULCRUM', 'EDITION', '3.0', 'CHAPTER', 'I', 'Down', 'the', 'Rabbit-Hole', 'Alice', 'was']

remove_strings = str.maketrans('                           ', '!*01.23456,789-\,?\'\.(:;)\"!')

words = [s.translate(remove_strings) for s in words]
words = [words.lower() for words in words]

I want to get rid of all the punctuations and numbers.

But it just converts to lower case and does not remove the punctuations as I thought it would.

What am I doing wrong?

CodePudding user response:

str.maketrans maps characters specified in the first argument to the second argument, so you're really just mapping a space to a different character with your current code. A quick fix therefore is to simply swap the two arguments:

remove_strings = str.maketrans('!*01.23456,789-\,?\'\.(:;)\"!', '                           ')

An easier approach would be to use a regex substitution to replace all non-alphabets with a space:

import re

words = [re.sub('[^a-z]', ' ', word, flags=re.I).lower() for word in words]
  • Related