How to extract dicts from list if these dicts have the same value for a specific key-CodePudding

I have a list of dicts like this

list = [
    {
        "a": "1",
        "b": "2",
        "c": "3"
    },
    {
        "a": "4",
        "b": "2",
        "c": "6"
    },
    {
        "a": "7",
        "b": "8",
        "c": "9"
    },
    {
        "a": "10",
        "b": "11",
        "c": "12"
    },
    {
        "a": "13",
        "b": "8",
        "c": "15"
    }
]

My goal is to extract all dicts in this list where the key "b" has the same "value". In my example it would be:

list_duplicates = [
    {
        "a": "1",
        "b": "2",
        "c": "3"
    },
    {
        "a": "4",
        "b": "2",
        "c": "6"
    },
    {
        "a": "7",
        "b": "8",
        "c": "9"
    },
    {
        "a": "13",
        "b": "8",
        "c": "15"
    }
]

How can I perform this? Maybe I can use the reverse logic: delete all dicts where the "b" key is not duplicated in my list?

CodePudding user response：

A pretty standard way to look for duplicates is to use a Counter (you can use a plain dict and a loop, but Counter makes it easier) and then filter on items with a count greater than 1.

>>> my_list = [
...     {
...         "a": "1",
...         "b": "2",
...         "c": "3"
...     },
...     {
...         "a": "4",
...         "b": "2",
...         "c": "6"
...     },
...     {
...         "a": "7",
...         "b": "8",
...         "c": "9"
...     },
...     {
...         "a": "10",
...         "b": "11",
...         "c": "12"
...     },
...     {
...         "a": "13",
...         "b": "8",
...         "c": "15"
...     }
... ]
>>> from collections import Counter
>>> b_counts = Counter(d["b"] for d in my_list)
>>> list_duplicates = [d for d in my_list if b_counts[d["b"]] > 1]
>>> list_duplicates
[{'a': '1', 'b': '2', 'c': '3'}, {'a': '4', 'b': '2', 'c': '6'}, {'a': '7', 'b': '8', 'c': '9'}, {'a': '13', 'b': '8', 'c': '15'}]

CodePudding user response：

Here is a functional version using itertools.groupby, itertools.chain, and filter:

from itertools import groupby, chain

list(chain(*filter(lambda x: len(x)>1,
                   (list(g) for k,g in groupby(sorted(lst, key=lambda d: d['b']),
                                               key=lambda d: d['b']))
              )))

Explanation: groupby the value of 'b' (for this the dictionary must be sorted by this key), then filter the groups per size, finally chain the output to form a single list.

NB. I named the list lst, not to conflict with the list builtin

output:

[{'a': '1', 'b': '2', 'c': '3'},
 {'a': '4', 'b': '2', 'c': '6'},
 {'a': '7', 'b': '8', 'c': '9'},
 {'a': '13', 'b': '8', 'c': '15'}]

CodePudding user response：

This approach basically creates a hash map that provides the list of indices where a distinct value of the key is present. The key here is the corresponding value of 'b' in lst.
So basically countMap would look like this:

{'2': [0, 1], '8': [2, 4], '11': [3]}

From this, we can create the output list which appends only the indices from original list where the key value is duplicated.

countMap = {}
for i in range(len(lst)):
    val = lst[i]['b']
    if countMap.get(val) is not None:
        countMap[val].append(i)
    else:
        countMap[val] = [i]

output = []
for i in countMap.keys():
    if len(countMap[i]) > 1:
        for j in countMap[i]:
            output.append(lst[j])

print(output)

Output

[{'a': '1', 'b': '2', 'c': '3'}, {'a': '4', 'b': '2', 'c': '6'}, {'a': '7', 'b': '8', 'c': '9'}, {'a': '13', 'b': '8', 'c': '15'}]

CodePudding user response：

With list comprehension:

>>> [d for d in lst if len([di for di in lst if di["b"]==d["b"]])>1]
[{'a': '1', 'b': '2', 'c': '3'},
 {'a': '4', 'b': '2', 'c': '6'},
 {'a': '7', 'b': '8', 'c': '9'},
 {'a': '13', 'b': '8', 'c': '15'}]