The scenario is this something like this:
After joining several lists using:
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1 list2 list3
mainlist.sort()
mainlist now looks like that:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'E']
I would like to remove anything that is not a duplicate value. If the value in question is already present in the list it must not be touched and while if it is present only once in the mainlist I would like to delete it.
I tried to use this approach but seems something isn't working:
for i in mainlist:
if mainlist.count(i) <= 1:
mainlist.remove(i)
else:
continue
but what I return is a list that looks like the following:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C', 'E'] #value "D" is not anymore present. Why?
What i would like to return is a list like that:
mainlist = ['A', 'A', 'B', 'B', 'C', 'C'] #All values NOT duplicates have been deleted
I can delete the duplicates with the below code:
for i in mainlist:
if mainlist.count(i) > 1:
mainlist.remove(i)
else:
continue
and then as a final result:
mainlist = ['A','B','C']
But the real question is: how can I delete the non-duplicates in a list?
CodePudding user response:
You can use collections.Counter()
to keep track of the frequencies of each item:
from collections import Counter
counts = Counter(mainlist)
[item for item in mainlist if counts[item] > 1]
This outputs:
['A', 'A', 'B', 'B', 'C', 'C']
CodePudding user response:
Use collections.Counter
to count the list elements. Use list comprehension to keep only the elements that occur more than once. Note that the list does not have to be sorted.
from collections import Counter
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
mainlist = list1 list2 list3
cnt = Counter(mainlist)
print(cnt)
# Counter({'A': 2, 'B': 2, 'C': 2, 'D': 1, 'E': 1})
dups = [x for x in mainlist if cnt[x] > 1]
print(dups)
# ['A', 'B', 'A', 'B', 'C', 'C']
CodePudding user response:
You can find duplicates like this:
duplicates = [item for item in mainlist if mainlist.count(item) > 1]
CodePudding user response:
Another solution, using numpy
:
u, c = np.unique(mainlist, return_counts=True)
out = np.repeat(u[c > 1], c[c > 1])
print(out)
Prints:
['A' 'A' 'B' 'B' 'C' 'C']
CodePudding user response:
Your problem lies in you operating on the while iterating over it. After removing the "D"
the loops stops because there are no more elements in the list as the "E"
at index 6.
Create a copy of the list and only operate on that list:
new_list = list(mainlist)
for i in mainlist:
if mainlist.count(i) <= 1:
new_list.remove(i)
else:
continue
CodePudding user response:
If you want to output only a list of duplicate elements in your lists, you can use sets and a comprehension to keep only the duplicates.
list1 = ["A","B"]
list2 = ["A","B","C"]
list3 = ["C","D","E"]
fulllist = list1 list2 list3
fullset = set(list1) | set(list2) | set(list3)
dups = [x for x in fullset if fulllist.count(x) > 1]
print(dups) # ['A', 'C', 'B']