I want to save just normal letters and apostrophes with re.sub command in Python, however right now my code removes apostrophes so don't becomes dont etc. Can i add a "save" of apostrophes to my re.sub command or do I have to use some other solution?
My code right now:
text = open("songs/" artist "/" album "/" song, "r", encoding="latin-1")
lines = text.readlines()
for line in lines:
line = line.lower()
line = re.sub('[^a-z ]', '', line)
words = line.split(" ")
CodePudding user response:
The code
re.sub('[^a-z ]', '', line)
is taking all characters that are not (^
) either lowercase a-z
, or space
, and removing them (by replacing them with ''
)
You want to add apostrophes to the list of characters that are preserved. In order to do so, you can either escape the single-quote/apostrophe character in your regex:
re.sub('[^a-z \']', '', line)
or use double-quotes in the string for your regex:
re.sub("[^a-z ']", '', line)
separate comment
By the way, a modern way of filling in a string with variables is with an f-string (documentation). Instead of
"songs/" artist "/" album "/" song
you can use
f"songs/{artist}/{album}/{song}"