I want to split the text
:
import re
text = " climb - 95/ 85 0.18 low - 4680"
split_text = re.split(" ", text)
print(split_text)
['', 'climb', '-', '95/', '85', '0.18', 'low', '-', '4680']
My problem is that " 95/ 85
" should be not be splitted.
How can I get as result:
# scanned_text = ['', 'climb', '-', ' 95/ 85', '0.18', 'low', '-', '4680']
CodePudding user response:
Simply add a second space before the . This will stop the 95/ 85 from being split. If you want \n at the end of the last item, add text = "\n".
import re
text = " climb - 95/ 85 0.18 low - 4680"
text = "a " text
text = "\n"
split_text = re.split(" ", text)
if split_text[0] == "a":
split_text[0] = ""
else:
split_text[0] = split_text[0][2:]
print(split_text)
CodePudding user response:
There could be many rules that would apply to your single example but still be wrong for the pattern of data you have to process. So you're forcing us to guess what the rule for that 95/ 85
exception is.
Here's a wild guess: spaces following a forward slash are not to be treated as separators
In which case, you could handle it using a look behind:
import re
text = " climb - 95/ 85 0.18 low - 4680"
split_text = re.split(r"(?<!\/) ", text)
print(split_text)
['', 'climb', '-', '95/ 85', '0.18', 'low', '-', '4680']
The exception rule could also be : The 4th and 5th values need to be combined
In which case you could do this:
split_text = re.split(" ", text)
split_text[3:5] = [" ".join(split_text[3:5])]
print(split_text)
['', 'climb', '-', '95/ 85', '0.18', 'low', '-', '4680']
Obviously different rules that give the right output for this example will produce different results for other strings. That's why you need to be specific.
CodePudding user response:
You can ask to split with at least 2 spaces
import re
text = " climb - 95/ 85 0.18 low - 4680"
split_text = re.split("\s{2,}", text)
print(split_text)
# [' climb', '-', '95/ 85', '0.18', 'low', '-', '4680']
Works too without regex
text = " climb - 95/ 85 0.18 low - 4680"
split_text = text.split(' ')
print(split_text)
# [' climb', ' -', '95/ 85', '', ' 0.18', ' low', '', ' -', '4680']
With some more manipulation, you can also remove extra spaces
text = " climb - 95/ 85 0.18 low - 4680"
split_text = list(map(lambda x: x.strip(), text.split(' ')))
print(split_text)
# ['climb', '-', '95/ 85', '', '0.18', 'low', '', '-', '4680']