Finding different values for the same key in different dictionaries in Python-CodePudding

I have lots of dictionary in one list. For example;

totalList = [
{'id': 1111, 'source': 'user_1', 'count_id': 10, 'description': 'aaaa'}, 
{'id': 1412, 'source': 'user_2', 'count_id': 5, 'description': 'bbbb'}, 
{'id': 5123, 'source': 'user_1', 'count_id': 10, 'description': 'aaaa'}, 
{'id': 1982, 'source': 'user_3', 'count_id': 7, 'description': 'bbbb'},
{'id': 3198, 'source': 'user_3', 'count_id': 7, 'description': 'bbbb'},
{'id': 1082, 'source': 'user_1', 'count_id': 10, 'description': 'aaaa'}
              ]

The id's are always different.
All keys are the same.

I want to get id's that have the same source, same count_id and same description values. In this example, I just need to get the id's. Output:

1111, 5123, 1082 same
1982, 3198 same

How can i achieve this?

Thanks.

CodePudding user response：

I'd reformat the data into a dictionary of items, where each key is a tuple of the three values you care about. Then you can iterate through the dictionary and efficiently find duplicates.

# Original data
totalList = [
    {'id': 1111, 'source': 'user_1', 'count_id': 10, 'description': 'aaaa'}, 
    {'id': 1412, 'source': 'user_2', 'count_id': 5, 'description': 'bbbb'}, 
    {'id': 5123, 'source': 'user_1', 'count_id': 10, 'description': 'aaaa'}, 
    {'id': 1982, 'source': 'user_3', 'count_id': 7, 'description': 'bbbb'},
    {'id': 3198, 'source': 'user_3', 'count_id': 7, 'description': 'bbbb'},
    {'id': 1082, 'source': 'user_1', 'count_id': 10, 'description': 'aaaa'}
]

# Detect duplicates
from collections import defaultdict

def get_key(item):
    return (item['source'], item['count_id'], item['description'])

ids_by_source_count_and_desc = defaultdict(list)
for item in totalList:
    ids_by_source_count_and_desc[get_key(item)].append(item['id'])

for key in ids_by_source_count_and_desc:
    ids = ids_by_source_count_and_desc[key]
    if len(ids) > 1:
        print(key, "same", ids)

I also use defaultdict to avoid having to check if the dictionary I'm inserting into already contains a list.

Output:

('user_1', 10, 'aaaa') same [1111, 5123, 1082]
('user_3', 7, 'bbbb') same [1982, 3198]

CodePudding user response：

Personally speaking, working with pandas mostly can make coding much faster and simpler. What I have come up with is as what follows:

import pandas as pd
df = pd.DataFrame(totalList)
result = {}
groups = df.groupby(by=["source", "count_id", "description"])["id"]
for name, group in groups:
  tempList = group.tolist()
  if len(tempList) > 1:
    result[name] = group.tolist()
result

Ouput

{('user_1', 10, 'aaaa'): [1111, 5123, 1082],
 ('user_3', 7, 'bbbb'): [1982, 3198]}

To get the same output as the one mentioned your answer, you just need to loop over the result variable and use join function on the list:

for key, value in result.items():
  print(",".join(str(v) for v in value)   " same")

Final Output

1111,5123,1082 same
1982,3198 same

Note that, we need to use str(v) for v in value in the join function since the value does not contain strings, rather it contains just floats.