I have a list similar to the one below and I want to consider words with the same content as one word. Is there a way to do this more effectively?
list 1 = ['data mining', 'datamining', 'data science', 'graph model']
list 2 = ['data mining', 'data mining', 'data mining', 'graph model']
list 2 = [item.replace('datamining', 'data mining') for item in list 1]
list 2 = [item.replace('data science', 'data mining') for item in list 1]
CodePudding user response:
Create a mapping of words to their canonical replacement, for example
d = {
'datamining': 'data mining',
'data science': 'data mining',
}
Then replace each word in list1
with its canonical replacement; unmapped words are replaced with themselves.
list2 = [d.get(k, k) for k in list1]