Home > Net >  Split string based on delimiters and do something based on number of splits python
Split string based on delimiters and do something based on number of splits python

Time:11-03

Assume I have the following list of strings:

animals = [
'rhino, grey, 30 July 2022',
'giraffe, 30 March 2022',
'bird',
'llama, brown, 8 April 2022',
'tiger'
]

where the first item of the list (animal[0]) is the string rhino,grey,30 July 2022 , the second (animal[1]) is giraffe, 30 March 2022 and the third is bird and so on. The order of the items in each string is always animal name, color, birth date, but in some cases, color or date might be missing.

The code I would like to write would need to do the following: for each string in the list, split based on the comma, and add the result to a new list:

I have:

name = []
color = []
birthday= []

for animal in animals:
    name.append((animal.split(","))[0])
    color.append((animal.split(","))[1])
    birthday.append((animal.split(","))[2])

However, this does not work because in some cases, the color or the birthday might be missing, so I run into an IndexError (list index out of range). Can anyone think of a way of fixing this? for example, by counting the number of times that the string has been split?

CodePudding user response:

You can add a blank size in missing itens. Then you can split the data into a list name, color, birthday.

When using this lists, just remove the blanks items (or null markers that you used).

CodePudding user response:

The below code logic should work if your element misses birthday or color

animals = [
'rhino, grey, 30 July 2022',
'giraffe, 30 March 2022',
'bird',
'llama, brown, 8 April 2022',
'tiger'
]

name = []
color = []
birthday= []

for animal in animals:
    item_split = animal.split(", ")
    
    name.append(item_split[0])
    
    if len(item_split) == 3:
        color.append(item_split[1])
        birthday.append(item_split[2])

    elif len(item_split) == 2:
        # Checking year
        year = item_split[1].split(' ')[-1].strip()
        if year.isnumeric():
            birthday.append(item_split[1])
            color.append(' ')
        elif:
            color.append(item_split[1])
            birthday.append(' ')
    elif len(item_split) == 1:
        color.append(' ')
        birthday.append(' ')

CodePudding user response:

You'll probably run into troubles when using a list when certain data is missing (you won't know which color would go with which row, for example).

But, with regards to your question:

name = []
color = []
birthday= []

for animal in animals:
    split_animal = animal.split(",")
    if len(split_animal) == 3:
        name.append(split_animal[0])
        color.append(split_animal[1])
        birthday.append(split_animal[2])
    elif len(split_animal) == 2:
        name.append(split_animal[0])
        color.append(split_animal[1])
    elif len(split_animal) == 1:
        name.append(split_animal[0])
    else:
        pass

This is a very specific answer to your question that assumes the structure of you data. It isn't very efficient, either. You'd have to specify other aspects of the data and your intent for a more thorough solution.

CodePudding user response:

One problem you have to account for is the missing middle 'color' part, in which cases you have to shift the value at the index 1 (if present) to the next index (birthday). One way to do is to check if the value contains any digit (not sure how reliable it is):

def split_props(animal):
    parts = animal.split(',')
    if len(parts) > 1 and re.search('[0-9] ', parts[1]):
        parts.insert(1, None)
    return parts

animal_props = [split_props(animal) for animal in animals]

name, color, birthday = [[a[i]  if i < len(a) else None for a in animal_props] for i in [0,1,2]]
  • Related