Concatenating strings inside a list-CodePudding

I have a list of string values that can contain numbers or just words and I want to join together all strings that can't be converted to a number format (int or float) with a separator character such as -. So the output would be:

['string1', '862.4', 'string2', '755.84', 'string3-string4', '56.8']

I have come up with the following solution that returns an empty list []. I tried to review indexes and edge cases but I can't find a way of fixing it. At first I thought that it was going to throw an IndexError exception, but the code just runs fine, except for the wrong output it gives:

s = "string1 862.4 string2 755.84 string3 string4 56.8"
l = s.split(" ")

out = []
for i in range(len(l)):
  if l[i-1].isalpha() and l[i].isalpha():
    out[i-1] = f"{out[i-1]}-{out[i]}"

print(out)

Actual output:

[]

CodePudding user response：

This is something you could potentially use regex to solve. Here's a solution I came up with which is quite concise and I think can account for edge cases that might arise in your use-case:

import re

regex = re.compile('(?: |^)(-?\d (?:\.\d )?)(?: |$)')
final_array = [x.replace(' ', '-') for x in re.split(regex, s) if x != '']

# >>> final_array
# ['string1', '862.4', 'string2', '755.84', 'string3-string4', '56.8']

The central part of that regex (-?\d (?:\.\d )?) is just a check for numbers which can include a floating point and sign, ie. anything of the following examples: '0', '-2', '1.50', '-3.1415'

The reason we want to check if x != '' is because if the decimal numbers happen to fall at the end or start of the string, then re.split will put an empty string on the outer end of the split

CodePudding user response：

The docs say about isalpha:

Return True if all characters in the string are alphabetic

This is not true for any of the words in your input. If it were, your code would give a runtime error on out[i-1] as that is a reference to an index that does not exist in out -- out is empty.

The if block never enters. So the error inside that block is hidden. But the right hand side of the assignment should reference l[i], not out[i], and you should use out.append, not out[i-1]=

Here is a correction:

s = "string1 862.4 string2 755.84 string3 string4 56.8"
l = s.split(" ")

out = []
addnew = True
for word in l:
    try:
        float(word)
    except:  # It's not a number
        if addnew:  # First time in series?
            out.append(word)
            addnew = False
        else:  # Extend the non-number string series
            out[-1]  = f"-{word}"
    else:  # It's a number
        out.append(word)
        addnew = True

print(out)

CodePudding user response：

The most convenient way to determine the number is to use isalpha in combination with a float, and then use while to determine if you need to append elements and skip repeating stages when looping.

s = "string1 862.4 string2 755.84 string3 string4 56.8"
l = s.split(" ")


def is_num(num):
    try:
        float(num)
        return True
    except:
        return False


out, i = [], 1
while i < len(l)   1:

    if not(is_num(l[i-1]) or is_num(l[i])):
        out.append(f"{l[i-1]}-{l[i]}")
        i  = 1
    else:
        out.append(l[i-1])
    i  = 1

print(out)
# ['string1', '862.4', 'string2', '755.84', 'string3-string4', '56.8']

CodePudding user response：

using itertools and re modules:

from re import fullmatch
from itertools import groupby

s = "string1 string0 862.4 111 555 string2 755.84 string3 string4 string5 56.8"
out = ['-'.join(g) for _,g in groupby(s.split(), lambda x: fullmatch(r'[\d\.] ',x))]

>>> out

['string1-string0',
 '862.4',
 '111',
 '555',
 'string2',
 '755.84',
 'string3-string4-string5',
 '56.8']