I have a sentence that is mixed with numbers (integer and float), and often merged with other words. I want to separate the numbers and text and put it as a sentence.
The following does some work.
str1 = str1="test1.25nb 5test .5NB 00.5my_test 5unit 5.6"
re.findall(r'\d*\.*\d \.*\d*', str1)
re.split(r'\d*\.*\d \.*\d*', str1)
However, I could not figure out a better way that gives a result nicely.
Input: str1="test1.25nb 5test .5NB 00.5my_test 5unit 5.6"
Expected output: test 1.25 nb 5 test .5 NB 00.5 my_test 5 unit 5.6"
Thanks in advance.
CodePudding user response:
You can use
import re
str1 = "test1.25nb 5test .5NB 00.5my_test 5unit 5.6"
print( " ".join(re.split(r'\s*(\d*\.?\d )\s*', str1)) )
# => test 1.25 nb 5 test .5 NB 00.5 my_test 5 unit 5.6
Or, directly using re.sub
with strip()
at the end:
print( re.sub(r'\s*(\d*\.?\d )\s*', r' \1 ', str1).strip() )
See the Python demo. The \s*(\d*\.?\d )\s*
regex matches
\s*
- zero or more whitespaces(\d*\.?\d )
- captures into Group 1 (and hence these values are also present in the resulting list produced withre.split
) zero or more digits, an optional.
and one or more digits\s*
- zero or more whitespaces.
See the regex demo.
CodePudding user response:
If you are not tied to regular expressions, this may be a bit easier to understand:
import string
str1 = "test1.25nb 5test .5NB 00.5my_test 5unit 5.6"
cnt = len(str1)
str2 = ""
numdigits = string.digits "."
print(str1)
for i, c in enumerate(str1):
str2 = c
if i < cnt - 1:
nextc = str1[i 1]
if c in numdigits and nextc in string.ascii_letters or c in string.ascii_letters and nextc in numdigits:
str2 = " "
print(str2)
The basic logic is simple: for each character, peak at the next char, and see if there is a change between alphabetic and numeric status. If so, insert a space.
Note that the enumerate(list)
built-in function returns a pair of values, an index value followed by the next element of the list. This can simplify the indexing process within a loop.