Home > Mobile >  Remove duplicate adjacent of specific string from list
Remove duplicate adjacent of specific string from list

Time:12-07

I want to remove the duplicate adjacent of specific string from list. Suppose that I have a list as below:

list_ex = ['I', 'went', 'to', 'the', 'big', 'conference', ',', 'I', 'presented', 'myself', 'there', '.', 'After', 'the', '<word>conference</word>', '<word>conference</word>', ',', 'I', 'took', 'a', 'taxi', 'to', 'go', 'to', 'the', '<word>hotel</word>', '<word>hotel</word>', '.', 'Tomorrow', 'I', 'will', 'go', 'to', '<word>conference</word>', 'again', '.']

Here is what I have tried so far:

for item in list_ex:
    if item.startswith('<word>'):
        if item in new_list_ex and (item == list_ex[list_ex.index(item) 1]):
            continue
    new_list_ex.append(item)

My output of new_list_ex:

['I', 'went', 'to', 'the', 'big', 'conference', ',', 'I', 'presented', 'myself', 'there', '.', 'After', 'the', '<word>conference</word>', ',', 'I', 'took', 'a', 'taxi', 'to', 'go', 'to', 'the', '<word>hotel</word>', '.', 'Tomorrow', 'I', 'will', 'go', 'to', 'again', '.']

Desired output:

['I', 'went', 'to', 'the', 'big', 'conference', ',', 'I', 'presented', 'myself', 'there', '.', 'After', 'the', '<word>conference</word>', ',', 'I', 'took', 'a', 'taxi', 'to', 'go', 'to', 'the', '<word>hotel</word>', '.', 'Tomorrow', 'I', 'will', 'go', 'to', '<word>conference</word>', 'again', '.']

I feel like my list_ex[list_ex.index(item) 1] to detect the adjacent element did not work properly. How can I adjust to get the desired output?

Please note that order in this list is important.

CodePudding user response:

Test whether a word flagged with <word> is the last item in the new_list (new_list_ex[-1]); if so, continue (skip it). If not, just append the word to the new_list.

list_ex = ['I', 'went', 'to', 'the', 'big', 'conference', ',', 'I', 'presented', 'myself', 'there', '.', 'After', 'the', '<word>conference</word>', '<word>conference</word>', ',', 'I', 'took', 'a', 'taxi', 'to', 'go', 'to', 'the', '<word>hotel</word>', '<word>hotel</word>', '.', 'Tomorrow', 'I', 'will', 'go', 'to', '<word>conference</word>', 'again', '.']

new_list_ex = []
for item in list_ex:
    if item.startswith('<word>') and (item == new_list_ex[-1]):
        continue
    new_list_ex.append(item)

CodePudding user response:

i think you mean this work :


list_ex = ['I', 'went', 'to', 'the', 'big', 'conference', ',', 'I', 'presented', 'myself', 'there', '.', 'After', 'the', '<word>conference</word>', '<word>conference</word>', ',', 'I', 'took', 'a', 'taxi', 'to', 'go', 'to', 'the', '<word>hotel</word>', '<word>hotel</word>', '.', 'Tomorrow', 'I', 'will', 'go', 'to', '<word>conference</word>', 'again', '.']

new_list = []

for i in list_ex:
    if "<" in i and ">" in i:
        if i.split(">")[1].split("<")[0] in new_list:
            continue
        else:
            new_list.append(i.split(">")[1].split("<")[0])
    else:
        if i in new_list:
            continue
        else:
            new_list.append(i)


print(new_list)

  • Related