How can I split strings so that only numbers are selected?-CodePudding

please excuse me for the change in this question

I want to split a string (e.g. text1, text2), so that only numbers are output:

I tried the following:

import re

# example text1

text1 = " climb   -  95/ 85     0.18   low     -  4680"

split_text1 = re.split("  ", text1)

print(split_text1)

['', 'climb', '-', '95/', '85', '0.18', 'low', '-', '4680']

# example text2

text2 = "CD 3 TO   F TO GD   .80000E 02   .00000E 00   .00000E 00   .00000E 00 /"

split_text2 = re.split("  ", text2)

print(split_text2)

['CD 3 TO', 'F TO GD', '.80000E 02', '.00000E 00', '.00000E 00', '.00000E 00 /']

How can I get as result:

# split_text1 = ['95', '85', '0.18', '4680']

# split_text2 = ['3', '80.0','0.0', '0.0', 0.0]

CodePudding user response：

Simply add a second space before the . This will stop the 95/ 85 from being split. If you want \n at the end of the last item, add text = "\n".

import re

text = " climb   -  95/ 85     0.18   low     -  4680"

text = "a "   text

text  = "\n"

split_text = re.split("   ", text)

if split_text[0] == "a":
  split_text[0] = ""
else:
  split_text[0] = split_text[0][2:]

print(split_text)

CodePudding user response：

First version of the question:

You can ask to split with at least 2 spaces

import re

text = " climb   -  95/ 85     0.18   low     -  4680"

split_text = re.split("\s{2,}", text)

print(split_text)
# [' climb', '-', '95/ 85', '0.18', 'low', '-', '4680']

Works too without regex

text = " climb   -  95/ 85     0.18   low     -  4680"

split_text = text.split('  ')

print(split_text)
# [' climb', ' -', '95/ 85', '', ' 0.18', ' low', '', ' -', '4680']

With some more manipulation, you can also remove extra spaces

text = " climb   -  95/ 85     0.18   low     -  4680"

split_text = list(map(lambda x: x.strip(), text.split('  ')))

print(split_text)
# ['climb', '-', '95/ 85', '', '0.18', 'low', '', '-', '4680']

Revised question

You need to match numbers (\d in regex), some are floats (so we need to match a single dot), some are exponential (we need to match E )

Some thing like that should be a good start

import re

regex = r'[\d.E -] ' # try to match `E` and negatives

text1 = " climb   -  95/ 85     0.18   low     -  4680"
text2 = "CD 3 TO   F TO GD   .80000E 02   .00000E 00   .00000E 00   .00000E 00 /"

results1 = re.findall(regex, text1)
# ['-', '95', '85', '0.18', '-', '4680']

results2 = re.findall(regex, text2)
# ['3', '.80000E 02', '.00000E 00', '.00000E 00', '.00000E 00']

It matches a single - without numbers, we can be more specific for negative numbers.

import re

regex = r'-?\d [\d.E -]*'

text1 = " climb   -  95/ 85     0.18   low     -  4680"
text2 = "CD 3 TO   F TO GD   .80000E 02   .00000E 00   .00000E 00   .00000E 00 /"

results1 = re.findall(regex, text1)
# ['95', '85', '0.18', '4680']

results2 = re.findall(regex, text2)
# ['3', '80000E 02', '00000E 00', '00000E 00', '00000E 00']

You need to transform exponential to a float form, again, a map should do it

import re

regex = r'-?\d [\d.E -]*'

text1 = " climb   -  95/ 85     0.18   low     -  4680"
text2 = "CD 3 TO   F TO GD   .80000E 02   .00000E 00   .00000E 00   .00000E 00 /"

results1 = list(map(float, re.findall(regex, text1)))
# [95.0, 85.0, 0.18, 4680.0]

results2 = list(map(float, re.findall(regex, text2)))
# [3.0, 8000000.0, 0.0, 0.0, 0.0]

To more close to your proposition

import re

regex = r'-?\d [\d.E -]*'

def transform(value):
    if 'E' in value:
        return str(float(value))
    
    return value

text1 = " climb   -  95/ 85     0.18   low     -  4680"
text2 = "CD 3 TO   F TO GD   .80000E 02   .00000E 00   .00000E 00   .00000E 00 /"

results1 = list(map(transform, re.findall(regex, text1)))
# ['95', '85', '0.18', '4680']

results2 = list(map(transform, re.findall(regex, text2)))
# ['3', '8000000.0', '0.0', '0.0', '0.0']

And I just see now, that my regex miss the first dot....

import re

regex = r'-?(?:\d*\.\d |\d )(?:E[ -]\d )?'

def transform(value):
    if 'E' in value:
        return str(float(value))
    
    return value

text1 = " climb   -  95/ 85     0.18   low     -  4680"
text2 = "CD 3 TO   F TO GD   .80000E 02   .00000E 00   .00000E 00   .00000E 00 /"

results1 = list(map(transform, re.findall(regex, text1)))
# ['95', '85', '0.18', '4680']

results2 = list(map(transform, re.findall(regex, text2)))
# ['3', '80.0', '0.0', '0.0', '0.0']

To explain a little, -? it may start with minus.

(?: ) group without capturing, easier to group without changing the result

\d*\.\d match at least a dot and numbers after, may after numbers before the dot

| simple or

\d match any numbers

(?:\d*\.\d |\d ) everything together, so a group without capture that match any float or any integer

[ -] can be or -

(?:E[ -]\d )? quite the same, it a group without capture that match an E followed by or - with any integer after, the group itself can be here one time or never (the last ?)

CodePudding user response：

You could use findall to get the numeric patterns and convert the strings to float or int:

import re
def getNums(S):
    pattern = r"[ -]?(?:[0-9] \.?[0-9]*|\.[0-9] )(?:[Ee][ -]?[0-9] )?"
    result = []
    for part in re.findall(pattern,S):
        try:
            result.append(float(part))
            result[-1] = int(part)
        except ValueError:pass
    return result
                
text = " climb   -  95/ 85     0.18   low     -  4680"
print(getNums(text))
# [95, 85, 0.18, 4680]

text2 = "CD 3 TO   F TO GD   .80000E 02   .00000E 00   .00000E 00   .00000E 00 /"#
print(getNums(text2))
# [3, 80.0, 0.0, 0.0, 0.0]

I'm assuming you want the output to be all numeric values rather than a mix of reformatted strings and numerics

Here's a breakdown of the expression:

[ -]? Optional leading sign
(?:[0-9] \.?[0-9]*|\.[0-9] ) Mandatory central part (non-capturing group)
- ...|... either start with a digit or with a decimal point
- [0-9] \.?[0-9]* start with digit(s) with optional decimal point and optional fractional digits
- \.[0-9] start with a decimal point followed by one or more digits (i.e. a decimal point without digits on the left or right is not a number.)
(?:[Ee][ -][0-9] )? Optional exponent part (non-capturing group)
- E oe e to indicate start of exponent part
- [ -]? optional sign of exponent
- [0-9] mandatory exponent digits