I have a list of string values that can contain numbers or just words and I want to join together all strings that can't be converted to a number format (int or float) with a separator character such as -
. So the output would be:
['string1', '862.4', 'string2', '755.84', 'string3-string4', '56.8']
I have come up with the following solution that returns an empty list []
. I tried to review indexes and edge cases but I can't find a way of fixing it. At first I thought that it was going to throw an IndexError
exception, but the code just runs fine, except for the wrong output it gives:
s = "string1 862.4 string2 755.84 string3 string4 56.8"
l = s.split(" ")
out = []
for i in range(len(l)):
if l[i-1].isalpha() and l[i].isalpha():
out[i-1] = f"{out[i-1]}-{out[i]}"
print(out)
Actual output:
[]
CodePudding user response:
This is something you could potentially use regex to solve. Here's a solution I came up with which is quite concise and I think can account for edge cases that might arise in your use-case:
import re
regex = re.compile('(?: |^)(-?\d (?:\.\d )?)(?: |$)')
final_array = [x.replace(' ', '-') for x in re.split(regex, s) if x != '']
# >>> final_array
# ['string1', '862.4', 'string2', '755.84', 'string3-string4', '56.8']
The central part of that regex (-?\d (?:\.\d )?)
is just a check for numbers which can include a floating point and sign, ie. anything of the following examples: '0', '-2', '1.50', '-3.1415'
The reason we want to check if x != ''
is because if the decimal numbers happen to fall at the end or start of the string, then re.split
will put an empty string on the outer end of the split
CodePudding user response:
The docs say about isalpha
:
Return
True
if all characters in the string are alphabetic
This is not true for any of the words in your input. If it were, your code would give a runtime error on out[i-1]
as that is a reference to an index that does not exist in out
-- out
is empty.
The if
block never enters. So the error inside that block is hidden. But the right hand side of the assignment should reference l[i]
, not out[i]
, and you should use out.append
, not out[i-1]=
Here is a correction:
s = "string1 862.4 string2 755.84 string3 string4 56.8"
l = s.split(" ")
out = []
addnew = True
for word in l:
try:
float(word)
except: # It's not a number
if addnew: # First time in series?
out.append(word)
addnew = False
else: # Extend the non-number string series
out[-1] = f"-{word}"
else: # It's a number
out.append(word)
addnew = True
print(out)
CodePudding user response:
The most convenient way to determine the number is to use isalpha in combination with a float, and then use while to determine if you need to append elements and skip repeating stages when looping.
s = "string1 862.4 string2 755.84 string3 string4 56.8"
l = s.split(" ")
def is_num(num):
try:
float(num)
return True
except:
return False
out, i = [], 1
while i < len(l) 1:
if not(is_num(l[i-1]) or is_num(l[i])):
out.append(f"{l[i-1]}-{l[i]}")
i = 1
else:
out.append(l[i-1])
i = 1
print(out)
# ['string1', '862.4', 'string2', '755.84', 'string3-string4', '56.8']
CodePudding user response:
using itertools and re modules:
from re import fullmatch
from itertools import groupby
s = "string1 string0 862.4 111 555 string2 755.84 string3 string4 string5 56.8"
out = ['-'.join(g) for _,g in groupby(s.split(), lambda x: fullmatch(r'[\d\.] ',x))]
>>> out
['string1-string0',
'862.4',
'111',
'555',
'string2',
'755.84',
'string3-string4-string5',
'56.8']