Remove NOT duplicates value from list-CodePudding

The scenario is this something like this:

After joining several lists using:

list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]

mainlist = list1   list2   list3
mainlist.sort()

mainlist now looks like that:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'E']

I would like to remove anything that is not a duplicate value. If the value in question is already present in the list it must not be touched and while if it is present only once in the mainlist I would like to delete it.

I tried to use this approach but seems something isn't working:

for i in mainlist:
    if mainlist.count(i) <= 1:
        mainlist.remove(i)
    else:
        continue

but what I return is a list that looks like the following:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'E'] #value "D" is not anymore present. Why?

What i would like to return is a list like that:

mainlist = ['A', 'A', 'B', 'B', 'C', 'C'] #All values NOT duplicates have been deleted

I can delete the duplicates with the below code:

for i in mainlist:
    if mainlist.count(i) > 1:
        mainlist.remove(i)
    else:
        continue

and then as a final result:

mainlist = ['A','B','C']

But the real question is: how can I delete the non-duplicates in a list?

CodePudding user response：

You can use collections.Counter() to keep track of the frequencies of each item:

from collections import Counter

counts = Counter(mainlist)
[item for item in mainlist if counts[item] > 1]

This outputs:

['A', 'A', 'B', 'B', 'C', 'C']

CodePudding user response：

Use collections.Counter to count the list elements. Use list comprehension to keep only the elements that occur more than once. Note that the list does not have to be sorted.

from collections import Counter
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1   list2   list3

cnt = Counter(mainlist)
print(cnt)
# Counter({'A': 2, 'B': 2, 'C': 2, 'D': 1, 'E': 1})

dups = [x for x in mainlist if cnt[x] > 1]
print(dups)
# ['A', 'B', 'A', 'B', 'C', 'C']

CodePudding user response：

You can find duplicates like this:

duplicates = [item for item in mainlist if mainlist.count(item) > 1]

CodePudding user response：

Another solution, using numpy:

u, c = np.unique(mainlist, return_counts=True)
out = np.repeat(u[c > 1], c[c > 1])
print(out)

Prints:

['A' 'A' 'B' 'B' 'C' 'C']

CodePudding user response：

Your problem lies in you operating on the while iterating over it. After removing the "D" the loops stops because there are no more elements in the list as the "E" at index 6.

Create a copy of the list and only operate on that list:

new_list = list(mainlist)
for i in mainlist:
    if mainlist.count(i) <= 1:
        new_list.remove(i)
    else:
        continue

CodePudding user response：

If you want to output only a list of duplicate elements in your lists, you can use sets and a comprehension to keep only the duplicates.

list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]

fulllist = list1   list2   list3
fullset = set(list1) | set(list2) | set(list3)

dups = [x for x in fullset if fulllist.count(x) > 1]

print(dups)  # ['A', 'C', 'B']