Home > database >  Regex: drop numbers with some symbols
Regex: drop numbers with some symbols

Time:04-16

I try to clean my text. So I need to remove some numbers and also some combinations of numbers and symbols.

I have a string

s = '4/13/2022 2:20:03 pm from our side a more detailed analysis4  7 (495) 797-8700 77-8282'

And I want to get

'pm from our side a more detailed analysis4'

I tried to use

re.compile(r'\b(?:/|-|\ |\:)(\d )\b').sub(r' ', s)

but it returns me

'4   2   pm from our side a more detailed analysis4  7 (495) 797  77 '

What I do wrong and how can I drop just numbers and combinations of number and a specific symbol?

CodePudding user response:

You might match at least a single non word character surrounded by optional digits and trim the result

(?<!\S)\d*(?:[^\w\s] \d*) \s*

Explanation

  • (?<!\S) Assert a whitspace boundary to the leeft
  • \d* Match optional digits
  • (?:[^\w\s] \d*) Match 1 times at least a non word character and optional digits
  • \s* Match optional whitespace chars

Regex demo

import re

pattern = r"(?<!\S)\d*(?:[^\w\s] \d*) \s*"
s = "4/13/2022 2:20:03 pm from our side a more detailed analysis4  7 (495) 797-8700 77-8282 kl-1381033 substr1.substr2.ab-2021-44228.a"

print(re.sub(pattern, "", s))

Output

ppm from our side a more detailed analysis4 kl-1381033 substr1.substr2.ab-2021-44228.a

CodePudding user response:

Try this expression :

(?:\/|-|\ |\:|^|\(|\)| ) ?(\d )

You can test it there : https://regex101.com/r/aANxBR/1

CodePudding user response:

It appears you want to remove words that start with digits and symbols.

You could do:

import re 

s = '4/13/2022 2:20:03 pm from our side a more detailed analysis4  7 (495) 797-8700 77-8282 kl-1381033 substr1.substr2.ab-2021-44228.a'

>>> ' '.join(w for w in s.split() if not re.match(r'[\d( ]\S ', w))
'pm from our side a more detailed analysis4 kl-1381033 substr1.substr2.ab-2021-44228.a'

Including a completely Python solution:

bad_start='0123456789 ('
>>> ' '.join(w for w in s.split() if w[0] not in bad_start)
'pm from our side a more detailed analysis4 kl-1381033 substr1.substr2.ab-2021-44228.a'
  • Related