What is the best way to remove words in a string that start with numbers and contain periods in Python?
this_string = 'lorum3 ipsum 15.2.3.9.7 bar foo 1. v more text 46 2. here and even more text here v7.8.989'
If I use Regex:
re.sub('[0-9]*\.\w*', '', this_string)
The result will be:
'lorum3 ipsum bar foo v more text 46 here and even more text here v'
I'm expecting the word v7.8.989
not to be removed, since it's started with a letter.
It will be great if the removed words aren't adding the unneeded space. My Regex code above still adds space.
CodePudding user response:
You can use this regex to match the strings you want to remove:
(?:^|\s)(?=[0-9] \.[0-9.]*(?:\s|$))[0-9.]
It matches:
(?:^|\s)
: beginning of string or whitespace(?=[0-9] \.[0-9.]*(\s|$))
: a lookahead that asserts the next character is a digit; there are only digits and.
s between this character and the end of string or the next whitespace; and that there is at least one.
in this part[0-9.]
: some number of digits and periods
You can then replace any matches with the empty string. In python
this_string = 'lorum3 ipsum 15.2.3.9.7 bar foo 1. v more text 46 2. here and even more text here v7.8.989 and also 1.2.3c as well'
result = re.sub(r'(?:^|\s)(?=[0-9] \.[0-9.]*(?:\s|$))[0-9.] ', '', this_string)
Output:
lorum3 ipsum bar foo v more text 46 here and even more text here v7.8.989 and also 1.2.3c as well
CodePudding user response:
If you don't want to use regex, you can also do it using simple string operations:
res = ''.join(['' if (e.startswith(('0','1','2','3','4','5','6','7','8','9')) and '.' in e) else e ' ' for e in this_string.split()])