Goal: remove characters that come after four digits (like a year). Below is a reprex. I have "years". I would like to get "years_goal" (i.e. remove everything after the four digit years.) using regex or str.replace or other easier suggestions.
years = ["Nov 1999",
"Oct. 2003",
"August 2007 8:00 pm et"]
years_goal = ["Nov 1999",
"Oct. 2003",
"August 2007"]
CodePudding user response:
you can use module re
the regex that you need is - ^[0-9 ]*[a-zA-Z ] \d{4}
for matching a month and after a year
this would not print the strings with a dot after the month, but will work
import re
regex = "^[0-9 ]*[a-zA-Z ] \d{4}"
for year in years:
try:
print(re.match(regex, year)[0])
except:
continue
CodePudding user response:
You can either search for the end of a 4 digit number and slice out the rest:
output = [item[:re.search("\d{4}", item).end()] for item in years]
Or you can just check match for everything after a 4 digit number and remove it:
output = [re.sub("(?<=\d{4}).*", "", item, re.DOTALL) for item in years]
Both solutions should have almost the same speed and complexity.