Home > Software design >  Is there a way to remove all letters from a string
Is there a way to remove all letters from a string

Time:03-25

I have a list of titles with combined dates and descriptions, but I have to reduce this to just a list of dates. Some examples of these titles are stuff like this:

1/16 Stories of Time

5/18 Cock'a'doodle'do

However, some people are really bad at typing and have forgotten the spaces between the dates and the rest of the title. I need to remove everything except for numbers and the slashes between them. Using any method, but preferably regex, is there a simple way to do this? For the record, I do understand how to split and recompile the list for any method that would work on a single string.

CodePudding user response:

You can import string to get easy access to a string of all digits, add the slash to it, and then compare your date string against that to drop any character from the date string that's not in there:

import string
string.digits  = "/"
for character in date_string:
    if not character in string.digits:
        date_string = date_string.replace(character, "")

This will convert the date_string 5/18 Cock'a'doodle'do to just 5/18 without using regex at all.

CodePudding user response:

You're thinking about this backwards. If you want to extract the date at the start of a line, do that instead of trying to get rid of everything else.

You can use a regex like this: ^\d{1,2}/\d{1,2} which means:

  • ^ start of line
  • \d digit
    • {1,2} repeated one or two times

For example:

import re

lines = [
    '1/16 Stories of Time',
    "5/18 Cock'a'doodle'do",
    '6/22Bible']

for line in lines:
    match = re.match(r'^\d{1,2}/\d{1,2}', line)
    if match:
        print(match.group(0))

Output:

1/16
5/18
6/22

(Note that re.match always starts matching from the start of the string, so the ^ is redundant here.)

This is more rigorous against titles containing numbers and slashes, like say, 4/5 The 39 Steps / The Thirty-Nine Steps -> 4/5.

However, you'll have a problem if someone forgot the space for a title starts with a number, like say, 7/8100 Years of Solitude -> 7/81.

  • Related