Regex to match first occurrence of non alpha-numeric characters-CodePudding

I am parsing some user input to make a basic Discord bot assigning roles and such. I am trying to generalize some code to reuse for different similar tasks (doing similar things in different categories/channels).

Generally, I am looking for a substring (the category), then taking the string after as that categories value. I am looking line by line for my category, replacing the "category" substring and returning a stripped version. However, what I have now also replaces any space in the "value" string.

Originally the string looks like this:

Gamertag : 00test gamertag

What I want to do, is preserve the spaces in the value. The regex I am trying to do is: match all non alpha-numeric chars until the first letter.

My return is already matching non alpha but can't figure out how to get just first group, looks like it should be simply adding a ? to make it a lazy operator but not sure.. example code and string below (regex I want to replace is the final print string).

String I am working with:

- 00test Gamertag      #(or any non-alpha delimiter)

Desired Results (by matching and stripping the extra characters)

00test Gamertag     #(remove leading space and any non-alpha characters before the first words)

The regex I am trying to do is: match all non alpha-numeric chars until the first letter. Should be something like the following, which is close to what I use to strip non-alphas now but it does all not the first group - so I want to match the first group of non-alphas in a string to strip that part using re.sub..

\W ?

https://www.online-python.com/gDVhZrnmlq

Thank you!

CodePudding user response：

Your regex will substitute the non-alphanumerical characters anywhere in the input string. If you only need to have this happening at the start of the string, then use the start-of-input anchor (i.e. ^):

^\W

CodePudding user response：

It depends on your inputs, you can use two regex to achieve your goal, the first to remove all non alpha-numeric from your string including the ones between words, and the second one to remove whitespaces between words if there is more than one space between each two words :

import re


gamer_tag = "µ& - 00test          -   Gamertag"
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
gamer_tag = re.sub(r"  ", " ", gamer_tag)
print(gamer_tag.strip())

# Output: 00test Gamertag

You can remove the second re.sub() if you sure that there will no more than one space between words.

gamer_tag = "- 00test Gamertag "
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
print(gamer_tag.strip())

# Output: 00test Gamertag