Home > front end >  Python - Sequentially group strings in a list starting with the same substring
Python - Sequentially group strings in a list starting with the same substring

Time:08-20

I have a list of strings that looks like this:

my_list = ["A-Item one", "B-Item two", "A-Item three", "A-Item four", "B-Item five", "B-Item six"]

I'd like to sequentially group the items in the list if they start with the same letter. So the output would look something like this:

grouped_items = [["A-Item one"], ["B-Item two"], ["A-Item three", "A-Item four"], ["B-Item five", "B-Item six"]]

I don't want to group all the items in two groups. I want to create a new group each time an item starts with a different letter than the previous one.

CodePudding user response:

You can use iterools.groupby:

from itertools import groupby

my_list = [
    "A-Item one",
    "B-Item two",
    "A-Item three",
    "A-Item four",
    "B-Item five",
    "B-Item six",
]

out = [list(g) for _, g in groupby(my_list, lambda k: k[0])]
print(out)

Prints:

[
    ["A-Item one"],
    ["B-Item two"],
    ["A-Item three", "A-Item four"],
    ["B-Item five", "B-Item six"],
]

CodePudding user response:

Andrej's answer definitely works, but if you don't want to import itertools, try this.

my_list = ["A-Item one", "B-Item two", "A-Item three", "A-Item four", "B-Item five", "B-Item six"]

grouped_items = [ [my_list[0]] ]
letter = my_list[0][0]

for x in range(1, len(my_list)):

    if letter == my_list[x][0]:
        grouped_items[-1].append(my_list[x])
    else:
        grouped_items.append([my_list[x]])
        letter = my_list[x][0]

Your requirements were that it would loop through my_list and every time the first character changed, it would store the string into a new list. Thus, we need to set the conditions. I do this by defining grouped_items with the first string from my_list already inside. Then I set the variable 'letter' equal to the first character of the first string.

From here, we can loop through the rest of my_list. Make sure to start at index 1 (because we already introduced the first index 0 of my_list to grouped_items). Compare 'letter' to the first character of each string. If the character matches, at the last index in grouped_items (signified by [-1]), append to the inner list the string. If the character does not match, append to grouped_items a new list with the non-matching string. Reassign 'letter' to the first character of the string.

Ideally your goal should be easy to read code, so using itertools as Andrej suggests may be the better option. Hopefully though my answer helps you understand problems such as these in the future.

  • Related