python regex: replace numbers with a special character-CodePudding

Imagine I have a string like dslkf 234 dkf23 12asd 2 23 4. I want to replace all standalone numbers with <NUM>.

I have tried re.sub('\s\d \s', ' <NUM> ', s). I want to have dslkf <NUM> dkf23 12asd <NUM> <NUM> <NUM> in the end but what I get is: dslkf <NUM> dkf23 12asd <NUM> 23 4

I know why the "4" is not replaced because it's not followed by any space character. But for the other one I couldn't find out why.

CodePudding user response：

Do a replacement on \b\d \b:

inp = "dslkf 234 dkf23 12asd 2 23 4"
output = re.sub(r'\b\d \b', r'<NUM>', inp)
print(output)  # dslkf <NUM> dkf23 12asd <NUM> <NUM> <NUM>

CodePudding user response：

I found the answer myself. Using lookbehind and lookahead is very helpful. and for the end of the string, I got help from $ sign. the code looks like this:

pattern = "(?<=\s)\d (?=\s|$)"

new_s = re.sub(pattern, '<NUM>', s)

Although I found my answer before posting but since I didn't find a similar question, I still posted the question for future seekers.

CodePudding user response：

You don't necessarily need regex to do this, here is a faster alternative using split() and join():

data = "dslkf 234 dkf23 12asd 2 23 4"

new_data = " ".join(word if not word.isdigit() else "<NUM>" for word in data.split())
print(new_data)  # dslkf <NUM> dkf23 12asd <NUM> <NUM> <NUM>

We split the sentences into words, and for each words we check if it's a digit. If so, we replace it with <NUM>.