Let's say I have a list of dicts, called mydict, that looks like like this:
[{'id': 6384,
'character': 'Thomas A. Anderson / Neo',
'credit_id': '52fe425bc3a36847f80181c1',
'movie_id': 603},
{'id': 2975,
'character': 'Morpheus',
'credit_id': '52fe425bc3a36847f801818d',
'movie_id': 603},
{'id': 530,
'character': 'Trinity',
'credit_id': '52fe425bc3a36847f8018191',
'movie_id': 603},
{'id': 1331,
'character': 'Agent Smith',
'credit_id': '52fe425bc3a36847f8018195',
'movie_id': 603},
{'id': 3165802,
'character': 'MP Sergeant #1',
'credit_id': '62ade87f4142910051c8e002',
'movie_id': 28},
{'id': 18471,
'character': 'Self',
'credit_id': '6259ed263acd2016291eef43',
'movie_id': 963164},
{'id': 74611,
'character': 'Self',
'credit_id': '6259ed37ecaef515ff68cae6',
'movie_id': 963164}]
and I want to get all pairs of mydict['id'] values that have the same mydict['movie_id'] value - using only Python standard libraries. Essentially, returning
(6384, 2975)
(6384, 530)
(6384, 1331)
....
(18471, 74611)
Looping through every possible combination seems possible, but slow, with something like this.
results=[]
for i in mydict:
for j in mydict:
current = i['movie_id']
next = j['movie_id']
if current==next:
results.append(i['id'], j['id'])
Is there a dictionary comprehension way to achieve the same result?
CodePudding user response:
Consider using a collections.defaultdict() to group by movie_id
. Then use itertools.combinations() to loop over them pairwise:
from collections import defaultdict
from itertools import combinations
d = defaultdict(list)
for movie in credits:
d[movie['movie_id']].append(movie['id'])
for group in d.values():
for pair in combinations(group, 2):
print(pair)
For the given dataset, this outputs:
(6384, 2975)
(6384, 530)
(6384, 1331)
(2975, 530)
(2975, 1331)
(530, 1331)
(18471, 74611)
CodePudding user response:
An easy and understandable solution is to use the pandas library to do so.
import pandas as pd
my_data = mydict
df = pd.DataFrame.from_dict(my_data)
print(
df[ df['id'] == df['movie_id'] ]
)
This should work ok.
CodePudding user response:
You can do with groupby
and combinations
,
While using the groupby
it's expected the similar movie_id
appear together in main list, If not you have to sort the main list with movie_id
.
In [18]: from itertools import groupby
In [19]: from itertools import combinations
In [20]: for k,l in groupby(mydict, key=lambda x:x['movie_id']):
...: print(list(combinations([i.get('id') for i in l], 2)))
...:
[(6384, 2975), (6384, 530), (6384, 1331), (2975, 530), (2975, 1331), (530, 1331)]
[]
[(18471, 74611)]
CodePudding user response:
Using pandas
:
#lst is your list of dicts
out = pd.DataFrame(lst).groupby('movie_id')['id'].apply(
lambda x: list(itertools.combinations(x, 2))).to_dict()
Using itertools
:
out = {
k: list(combinations([d['id'] for d in list(g)], 2))
for k, g in groupby(lst, lambda x: x['movie_id'])
}
print(out):
{28: [],
603: [(6384, 2975),
(6384, 530),
(6384, 1331),
(2975, 530),
(2975, 1331),
(530, 1331)],
963164: [(18471, 74611)]}