Home > Mobile >  How to remove characters after a specific string
How to remove characters after a specific string

Time:12-31

Goal: remove characters that come after four digits (like a year). Below is a reprex. I have "years". I would like to get "years_goal" (i.e. remove everything after the four digit years.) using regex or str.replace or other easier suggestions.

years = ["Nov 1999",
        "Oct. 2003",
        "August 2007 8:00 pm et"]

years_goal = ["Nov 1999",
            "Oct. 2003",
            "August 2007"]

CodePudding user response:

you can use module re

the regex that you need is - ^[0-9 ]*[a-zA-Z ] \d{4} for matching a month and after a year

this would not print the strings with a dot after the month, but will work

import re
regex = "^[0-9 ]*[a-zA-Z ] \d{4}"
for year in years:
    try:
        print(re.match(regex, year)[0])
    except:
        continue

CodePudding user response:

You can either search for the end of a 4 digit number and slice out the rest:

output = [item[:re.search("\d{4}", item).end()] for item in years]

Or you can just check match for everything after a 4 digit number and remove it:

output = [re.sub("(?<=\d{4}).*", "", item, re.DOTALL) for item in years]

Both solutions should have almost the same speed and complexity.

  • Related