I want to compare two list of lists with a dataframe column.
list1=[[r2,r4,r6],[r6,r7]]
list2=[[p4,p5,p8],[p86,p21,p0,p94]]
Dataset:
|rid|pid|value|
|---|---|----|
|r2|p0|banana|
|r2|p4|chocolate|
|r4|p89|apple|
|r6|p5|milk|
|r7|p0|bread|
Output:
[[chocolate,milk],[bread]]
As r2
and p4
occur in the list1[0]
, list2[0]
and in the same row in dataset, so chocolate
must be stored. Similarly r6
and p5
occur in both lists at same position and in the same row in dataset,milk
must be stored.
CodePudding user response:
Answer
result = []
for l1, l2 in zip(list1, list2):
res = df.loc[df["rid"].isin(l1) & df["pid"].isin(l2)]["value"].tolist()
result.append(res)
[['chocolate', 'milk'], ['bread']]
Explain
zip
will combine the two lists, equivalent to
for i in range(len(list1)):
l1 = list1[i]
l2 = list2[i]
df["rid"].isin(l1) & df["pid"].isin(l2)
will combine the condition withand operator
&
Attation
- The length of list1 and list2 must be equal, otherwise,
zip
will ignore the rest element of the longer list.
CodePudding user response:
You can do it as follows:
from itertools import product
df = pd.DataFrame({'rid': {0: 'r2', 1: 'r2', 2: 'r4', 3: 'r6', 4: 'r7'},
'pid': {0: 'p0', 1: 'p4', 2: 'p89', 3: 'p5', 4: 'p0'},
'value': {0: 'banana', 1: 'chocolate', 2: 'apple', 3: 'milk', 4: 'bread'}})
list1 = [['r2','r4','r6'],['r6','r7']]
list2 = [['p4','p5','p8'],['p86','p21','p0','p94']]
# Generate all possible associations.
associations = (product(l1, l2) for l1, l2 in zip(list1, list2))
# Index for speed and convenience of the lookup.
df = df.set_index(['rid', 'pid']).sort_index()
output = [[df.loc[assoc, 'value'] for assoc in assoc_list if assoc in df.index]
for assoc_list in associations]
print(output)
[['chocolate', 'milk'], ['bread']]