please excuse me for the change in this question
I want to split a string (e.g. text1, text2), so that only numbers are output:
I tried the following:
import re
# example text1
text1 = " climb - 95/ 85 0.18 low - 4680"
split_text1 = re.split(" ", text1)
print(split_text1)
['', 'climb', '-', '95/', '85', '0.18', 'low', '-', '4680']
# example text2
text2 = "CD 3 TO F TO GD .80000E 02 .00000E 00 .00000E 00 .00000E 00 /"
split_text2 = re.split(" ", text2)
print(split_text2)
['CD 3 TO', 'F TO GD', '.80000E 02', '.00000E 00', '.00000E 00', '.00000E 00 /']
How can I get as result:
# split_text1 = ['95', '85', '0.18', '4680']
# split_text2 = ['3', '80.0','0.0', '0.0', 0.0]
CodePudding user response:
Simply add a second space before the . This will stop the 95/ 85 from being split. If you want \n at the end of the last item, add text = "\n".
import re
text = " climb - 95/ 85 0.18 low - 4680"
text = "a " text
text = "\n"
split_text = re.split(" ", text)
if split_text[0] == "a":
split_text[0] = ""
else:
split_text[0] = split_text[0][2:]
print(split_text)
CodePudding user response:
First version of the question:
You can ask to split with at least 2 spaces
import re
text = " climb - 95/ 85 0.18 low - 4680"
split_text = re.split("\s{2,}", text)
print(split_text)
# [' climb', '-', '95/ 85', '0.18', 'low', '-', '4680']
Works too without regex
text = " climb - 95/ 85 0.18 low - 4680"
split_text = text.split(' ')
print(split_text)
# [' climb', ' -', '95/ 85', '', ' 0.18', ' low', '', ' -', '4680']
With some more manipulation, you can also remove extra spaces
text = " climb - 95/ 85 0.18 low - 4680"
split_text = list(map(lambda x: x.strip(), text.split(' ')))
print(split_text)
# ['climb', '-', '95/ 85', '', '0.18', 'low', '', '-', '4680']
Revised question
You need to match numbers (\d
in regex), some are floats (so we need to match a single dot), some are exponential (we need to match E
)
Some thing like that should be a good start
import re
regex = r'[\d.E -] ' # try to match `E` and negatives
text1 = " climb - 95/ 85 0.18 low - 4680"
text2 = "CD 3 TO F TO GD .80000E 02 .00000E 00 .00000E 00 .00000E 00 /"
results1 = re.findall(regex, text1)
# ['-', '95', '85', '0.18', '-', '4680']
results2 = re.findall(regex, text2)
# ['3', '.80000E 02', '.00000E 00', '.00000E 00', '.00000E 00']
It matches a single -
without numbers, we can be more specific for negative numbers.
import re
regex = r'-?\d [\d.E -]*'
text1 = " climb - 95/ 85 0.18 low - 4680"
text2 = "CD 3 TO F TO GD .80000E 02 .00000E 00 .00000E 00 .00000E 00 /"
results1 = re.findall(regex, text1)
# ['95', '85', '0.18', '4680']
results2 = re.findall(regex, text2)
# ['3', '80000E 02', '00000E 00', '00000E 00', '00000E 00']
You need to transform exponential to a float form, again, a map should do it
import re
regex = r'-?\d [\d.E -]*'
text1 = " climb - 95/ 85 0.18 low - 4680"
text2 = "CD 3 TO F TO GD .80000E 02 .00000E 00 .00000E 00 .00000E 00 /"
results1 = list(map(float, re.findall(regex, text1)))
# [95.0, 85.0, 0.18, 4680.0]
results2 = list(map(float, re.findall(regex, text2)))
# [3.0, 8000000.0, 0.0, 0.0, 0.0]
To more close to your proposition
import re
regex = r'-?\d [\d.E -]*'
def transform(value):
if 'E' in value:
return str(float(value))
return value
text1 = " climb - 95/ 85 0.18 low - 4680"
text2 = "CD 3 TO F TO GD .80000E 02 .00000E 00 .00000E 00 .00000E 00 /"
results1 = list(map(transform, re.findall(regex, text1)))
# ['95', '85', '0.18', '4680']
results2 = list(map(transform, re.findall(regex, text2)))
# ['3', '8000000.0', '0.0', '0.0', '0.0']
And I just see now, that my regex miss the first dot....
import re
regex = r'-?(?:\d*\.\d |\d )(?:E[ -]\d )?'
def transform(value):
if 'E' in value:
return str(float(value))
return value
text1 = " climb - 95/ 85 0.18 low - 4680"
text2 = "CD 3 TO F TO GD .80000E 02 .00000E 00 .00000E 00 .00000E 00 /"
results1 = list(map(transform, re.findall(regex, text1)))
# ['95', '85', '0.18', '4680']
results2 = list(map(transform, re.findall(regex, text2)))
# ['3', '80.0', '0.0', '0.0', '0.0']
To explain a little, -?
it may start with minus.
(?: )
group without capturing, easier to group without changing the result
\d*\.\d
match at least a dot and numbers after, may after numbers before the dot
|
simple or
\d
match any numbers
(?:\d*\.\d |\d )
everything together, so a group without capture that match any float or any integer
[ -]
can be
or -
(?:E[ -]\d )?
quite the same, it a group without capture that match an E
followed by
or -
with any integer after, the group itself can be here one time or never (the last ?
)
CodePudding user response:
You could use findall to get the numeric patterns and convert the strings to float or int:
import re
def getNums(S):
pattern = r"[ -]?(?:[0-9] \.?[0-9]*|\.[0-9] )(?:[Ee][ -]?[0-9] )?"
result = []
for part in re.findall(pattern,S):
try:
result.append(float(part))
result[-1] = int(part)
except ValueError:pass
return result
text = " climb - 95/ 85 0.18 low - 4680"
print(getNums(text))
# [95, 85, 0.18, 4680]
text2 = "CD 3 TO F TO GD .80000E 02 .00000E 00 .00000E 00 .00000E 00 /"#
print(getNums(text2))
# [3, 80.0, 0.0, 0.0, 0.0]
I'm assuming you want the output to be all numeric values rather than a mix of reformatted strings and numerics
Here's a breakdown of the expression:
[ -]?
Optional leading sign(?:[0-9] \.?[0-9]*|\.[0-9] )
Mandatory central part (non-capturing group)...|...
either start with a digit or with a decimal point[0-9] \.?[0-9]*
start with digit(s) with optional decimal point and optional fractional digits\.[0-9]
start with a decimal point followed by one or more digits (i.e. a decimal point without digits on the left or right is not a number.)
(?:[Ee][ -][0-9] )?
Optional exponent part (non-capturing group)E
oee
to indicate start of exponent part[ -]?
optional sign of exponent[0-9]
mandatory exponent digits