Home > database >  Convert list into dict of prefix with different delimiters
Convert list into dict of prefix with different delimiters

Time:02-06

I am trying to convert a list of items that have three unique prefixes (e.g. apple_, banana_, water_melon_)

The initial list looks like this table_ids = ["apple_1", "apple_2", "apple_3", "banana_1", "banana_2", "banana_3", "water_melon_1", "water_melon_2", "water_melon_3"]

My desired outcome would look like this: {"apple": ["_1", "_2", "_3"], "banana": ["_1", "_2", "_3"], "water_melon": ["_1", "_2", "_3"]}

I've tried this

prefixes = ["apple_", "banana_", "water_melon_"]

res =[[id for id in table_ids if(id.startswith(prefix))] for prefix in prefixes]

However, this creates a list of list grouped by prefixes.

CodePudding user response:

You can use str.rsplit and collections.defaultdict.

from collections import defaultdict
res = defaultdict(list)
for t in table_ids:
    res[t.rsplit('_', 1)[0]].append('_'   t.rsplit('_', 1)[1])
print(res)

Output:

defaultdict(<class 'list'>, {'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']})

CodePudding user response:

You can't do this with a list comprehension because you're trying to create a dict (not a list), and you can't do it with a dict comprehension efficiently because you can't determine which entries go in each sublist without iterating over the original list in its entirety.

Here's an example of how to do it by iterating over the list and appending to entries in a dictionary:

>>> table_ids = ["apple_1", "apple_2", "apple_3", "banana_1", "banana_2", "banana_3", "water_melon_1", "water_melon_2", "water_melon_3"]
>>> tables = {}
>>> for x in table_ids:
...     t, _, i = x.rpartition("_")
...     tables.setdefault(t, []).append("_"   i)
...
>>> tables
{'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']}

If you really wanted to do it in a nested dict/list comprehension, that'd look like:

>>> {t: ["_"   x.rpartition("_")[2] for x in table_ids if x.startswith(t)] for t in {x.rpartition("_")[0] for x in table_ids}}
{'apple': ['_1', '_2', '_3'], 'banana': ['_1', '_2', '_3'], 'water_melon': ['_1', '_2', '_3']}

Note that the list comprehensions inside the dict comprehension make this O(N^2) whereas the first version is O(N).

  • Related