I have list of strings that I want to group together if they contain a specific substring from a master list.
Example Input:
["Audi_G71Q3E5T7_Coolant", "Volt_Battery_G9A2B4C6D8E", "Speaker_BMW_G71Q3E5T7", "Engine_Benz_G9A2B4C6D8E", "Ford_G9A2B4C6D8E_Wheel", "Toyota_Exhaust_G71Q3E5T7"]
Master List:
["G71Q3E5T7", "G9A2B4C6D8E"]
Expected Output:
[["Audi_G71Q3E5T7_Coolant", "Speaker_BMW_G71Q3E5T7", "Toyota_Exhaust_G71Q3E5T7"], ["Volt_Battery_G9A2B4C6D8E", "Engine_Benz_G9A2B4C6D8E", "Ford_G9A2B4C6D8E_Wheel"]]
I haven't found any example or solution online but am aware the itertools.groupby()
function is useful in these scenarios, but am struggling to make it work.
CodePudding user response:
Example inputs:
in_list = ["Audi_G71Q3E5T7_Coolant", "Volt_Battery_G9A2B4C6D8E", "Speaker_BMW_G71Q3E5T7",
"Engine_Benz_G9A2B4C6D8E", "Ford_G9A2B4C6D8E_Wheel", "Toyota_Exhaust_G71Q3E5T7"]
master_list = ["G71Q3E5T7", "G9A2B4C6D8E"]
out_list = []
For every master item, check if is in any input item. Add results to out_list[index]
for index, m in enumerate(master_list):
out_list.append([])
for i in in_list:
if m in i:
out_list[index].append(i)
print(out_list)
CodePudding user response:
itertools.groupby
is not appropriate as you would need to sort the data.
You can use a regex to extract the IDs, and a dictionary/defaultdict to collect the data:
L = ["Audi_G71Q3E5T7_Coolant", "Volt_Battery_G9A2B4C6D8E",
"Speaker_BMW_G71Q3E5T7", "Engine_Benz_G9A2B4C6D8E",
"Ford_G9A2B4C6D8E_Wheel", "Toyota_Exhaust_G71Q3E5T7"]
ids = ["G71Q3E5T7", "G9A2B4C6D8E"]
import re
from collections import defaultdict
regex = re.compile('|'.join(map(re.escape, ids)))
# re.compile(r'G71Q3E5T7|G9A2B4C6D8E', re.UNICODE)
out = defaultdict(list)
for item in L: # for each item
m = regex.search(item) # try to find ID
if m: # if ID
out[m.group()].append(item) # add to appropriate list
out = list(out.values())
output:
[['Audi_G71Q3E5T7_Coolant',
'Speaker_BMW_G71Q3E5T7',
'Toyota_Exhaust_G71Q3E5T7'],
['Volt_Battery_G9A2B4C6D8E',
'Engine_Benz_G9A2B4C6D8E',
'Ford_G9A2B4C6D8E_Wheel']]