Home > Back-end >  List comprehension - combine definitions of duplicate words so that each word is unique but may have
List comprehension - combine definitions of duplicate words so that each word is unique but may have

Time:10-11

I have a two dimensional list of words and their respective definitions. As you can see in the example below, some words appear more than once but with different definitions. I would to combine the definitions of duplicate words so that each word only appears once.

list_of_lists = [
  ['absorption', 'a process in which one substance permeates another'],
  ['absorption', 'when radiated energy is retained on passing through a medium'],
  ['aerobic', 'depending on free oxygen or air'],
  ['aerobic', 'enhancing respiratory and circulatory efficiency'],
  ['chain reaction', 'a self-sustaining nuclear reaction'],
  ['chain reaction', 'a series of chemical reactions in which the product of one is a reactant in the next']
]

Expected output after some programming magic

['absorption', 'a process in which one substance permeates another', 'when radiated energy is retained on passing through a medium']
['aerobic', 'depending on free oxygen or air', 'enhancing respiratory and circulatory efficiency']
['chain reaction', 'a self-sustaining nuclear reaction', 'a series of chemical reactions in which the product of one is a reactant in the next']

CodePudding user response:

Assuming this is your data:

list_of_lists = [
  ['absorption', 'a process in which one substance permeates another'],
  ['absorption', 'when radiated energy is retained on passing through a medium'],
  ['aerobic', 'depending on free oxygen or air'],
  ['aerobic', 'enhancing respiratory and circulatory efficiency'],
  ['chain reaction', 'a self-sustaining nuclear reaction'],
  ['chain reaction', 'a series of chemical reactions in which the product of one is a reactant in the next'],
]

You could use a groupby expression like so:

from itertools import groupby
from operator import itemgetter

for key, group in groupby(list_of_lists, itemgetter(0)):
  print([key]   list(map(itemgetter(1), group)))

Output:

['absorption', 'a process in which one substance permeates another', 'when radiated energy is retained on passing through a medium']
['aerobic', 'depending on free oxygen or air', 'enhancing respiratory and circulatory efficiency']
['chain reaction', 'a self-sustaining nuclear reaction', 'a series of chemical reactions in which the product of one is a reactant in the next']

CodePudding user response:

You can use defaultdict from the standard collections library to make a dictionary of keys -> list of definitions. This by itself might be useful, but it's also easy to transform to a list of tuples:

from collections import defaultdict

l = [
    ('absorption','a process in which one substance permeates another'),
    ('absorption', 'when radiated energy is retained on passing through a medium'),
    ('aerobic', 'depending on free oxygen or air'),
    ('aerobic', 'enhancing respiratory and circulatory efficiency'),
    ('chain reaction', 'a self-sustaining nuclear reaction'),
    ('chain reaction', 'a series of chemical reactions in which the product of one is a reactant in the next')
]

res = defaultdict(list)

for k, v in l:
    res[k].append(v)

# res is a dict so you can look up words:
print(res['aerobic'])
# ['depending on free oxygen or air', 'enhancing respiratory and circulatory efficiency']

# to get back a list of tuples, just pass the dict items to list()
collected_list = list(res.items())

# [('absorption',
#   ['a process in which one substance permeates another',
#   'when radiated energy is retained on passing through a medium']),
# ('aerobic',
#   ['depending on free oxygen or air',
#   'enhancing respiratory and circulatory efficiency']),
# ('chain reaction',
#   ['a self-sustaining nuclear reaction',
#   'a series of chemical reactions in which the product of one is a reactant in the next'])]
  • Related