I have a list of dicts like this
list = [
{
"a": "1",
"b": "2",
"c": "3"
},
{
"a": "4",
"b": "2",
"c": "6"
},
{
"a": "7",
"b": "8",
"c": "9"
},
{
"a": "10",
"b": "11",
"c": "12"
},
{
"a": "13",
"b": "8",
"c": "15"
}
]
My goal is to extract all dicts in this list where the key "b" has the same "value". In my example it would be:
list_duplicates = [
{
"a": "1",
"b": "2",
"c": "3"
},
{
"a": "4",
"b": "2",
"c": "6"
},
{
"a": "7",
"b": "8",
"c": "9"
},
{
"a": "13",
"b": "8",
"c": "15"
}
]
How can I perform this? Maybe I can use the reverse logic: delete all dicts where the "b" key is not duplicated in my list?
CodePudding user response:
A pretty standard way to look for duplicates is to use a Counter
(you can use a plain dict and a loop, but Counter
makes it easier) and then filter on items with a count greater than 1.
>>> my_list = [
... {
... "a": "1",
... "b": "2",
... "c": "3"
... },
... {
... "a": "4",
... "b": "2",
... "c": "6"
... },
... {
... "a": "7",
... "b": "8",
... "c": "9"
... },
... {
... "a": "10",
... "b": "11",
... "c": "12"
... },
... {
... "a": "13",
... "b": "8",
... "c": "15"
... }
... ]
>>> from collections import Counter
>>> b_counts = Counter(d["b"] for d in my_list)
>>> list_duplicates = [d for d in my_list if b_counts[d["b"]] > 1]
>>> list_duplicates
[{'a': '1', 'b': '2', 'c': '3'}, {'a': '4', 'b': '2', 'c': '6'}, {'a': '7', 'b': '8', 'c': '9'}, {'a': '13', 'b': '8', 'c': '15'}]
CodePudding user response:
Here is a functional version using itertools.groupby
, itertools.chain
, and filter
:
from itertools import groupby, chain
list(chain(*filter(lambda x: len(x)>1,
(list(g) for k,g in groupby(sorted(lst, key=lambda d: d['b']),
key=lambda d: d['b']))
)))
Explanation: groupby
the value of 'b' (for this the dictionary must be sorted by this key), then filter
the groups per size, finally chain
the output to form a single list.
NB. I named the list lst, not to conflict with the list
builtin
output:
[{'a': '1', 'b': '2', 'c': '3'},
{'a': '4', 'b': '2', 'c': '6'},
{'a': '7', 'b': '8', 'c': '9'},
{'a': '13', 'b': '8', 'c': '15'}]
CodePudding user response:
This approach basically creates a hash map that provides the list of indices where a distinct value of the key is present. The key here is the corresponding value of 'b' in lst
.
So basically countMap
would look like this:
{'2': [0, 1], '8': [2, 4], '11': [3]}
From this, we can create the output list which appends only the indices from original list where the key value is duplicated.
countMap = {}
for i in range(len(lst)):
val = lst[i]['b']
if countMap.get(val) is not None:
countMap[val].append(i)
else:
countMap[val] = [i]
output = []
for i in countMap.keys():
if len(countMap[i]) > 1:
for j in countMap[i]:
output.append(lst[j])
print(output)
Output
[{'a': '1', 'b': '2', 'c': '3'}, {'a': '4', 'b': '2', 'c': '6'}, {'a': '7', 'b': '8', 'c': '9'}, {'a': '13', 'b': '8', 'c': '15'}]
CodePudding user response:
With list comprehension:
>>> [d for d in lst if len([di for di in lst if di["b"]==d["b"]])>1]
[{'a': '1', 'b': '2', 'c': '3'},
{'a': '4', 'b': '2', 'c': '6'},
{'a': '7', 'b': '8', 'c': '9'},
{'a': '13', 'b': '8', 'c': '15'}]