Home > Software engineering >  How to identify a word that starts with a number or special character not using regex?
How to identify a word that starts with a number or special character not using regex?

Time:11-21

I need to remove words from a string that begin with a number or special character. I cannot simply put specific values since the string is based on user input, so it will be unknown. All I've been able to come up with, without having to import anything, is using

.startswith(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, '!', '"', '#', '$', '%', '&', '(', ')', '*', ' ', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\', ']', '^', '_', '`', '{', '|', '}', '~', ')', ':')

There must be an easier, less lengthy way, right?

CodePudding user response:

I need to remove words from a string that begin with a number or special character. [...]

I'd suggest taking a look at the string module. This is a builtin module which defines common characters such as punctuation, digits, alphanumeric characters, etc.

From there, it should be straightforward enough to transfer the desired variables from the string module as variables that you define in code:

digits = '0123456789'
punctuation = r"""!"#$%&'()* ,-./:;<=>?@[\]^_`{|}~"""

invalid_start_chars = digits   punctuation

Then test with a sample input:

string = "Hello 123World H0w's @It [Going? Testing123"

print(' '.join(
    [word for word in string.split()
     if word[0] not in invalid_start_chars]
))

Output:

Hello H0w's Testing123

CodePudding user response:

I'd recommend to use standart module string.

from string import punctuation


def check_word(word: str) -> bool:
    return not word.startswith(tuple(punctuation '0123456789'))


def fix_string(s: str) -> str:
    return " ".join(word for word in s.split() if check_word(word))

So you could use function fix_string like this:

s = "Hello !world! This is a good .test"
print('Result:', fix_string(s))
# Result: Hello This is a good
  • Related