comparing two list of lists with a dataframe column python-CodePudding

I want to compare two list of lists with a dataframe column.
list1=[[r2,r4,r6],[r6,r7]]
list2=[[p4,p5,p8],[p86,p21,p0,p94]]

Dataset:

|rid|pid|value| |---|---|----| |r2|p0|banana| |r2|p4|chocolate| |r4|p89|apple| |r6|p5|milk| |r7|p0|bread|
Output: [[chocolate,milk],[bread]]

As r2 and p4 occur in the list1[0], list2[0] and in the same row in dataset, so chocolate must be stored. Similarly r6 and p5 occur in both lists at same position and in the same row in dataset,milk must be stored.

CodePudding user response：

Answer

result = []
for l1, l2 in zip(list1, list2):
    res = df.loc[df["rid"].isin(l1) & df["pid"].isin(l2)]["value"].tolist()
    result.append(res)

[['chocolate', 'milk'], ['bread']]

Explain

zip will combine the two lists, equivalent to

for i in range(len(list1)):
    l1 = list1[i]
    l2 = list2[i]

df["rid"].isin(l1) & df["pid"].isin(l2) will combine the condition with and operator &

Attation

The length of list1 and list2 must be equal, otherwise, zip will ignore the rest element of the longer list.

CodePudding user response：

You can do it as follows:

from itertools import product

df = pd.DataFrame({'rid': {0: 'r2', 1: 'r2', 2: 'r4', 3: 'r6', 4: 'r7'},
 'pid': {0: 'p0', 1: 'p4', 2: 'p89', 3: 'p5', 4: 'p0'},
 'value': {0: 'banana', 1: 'chocolate', 2: 'apple', 3: 'milk', 4: 'bread'}})
list1 = [['r2','r4','r6'],['r6','r7']]
list2 = [['p4','p5','p8'],['p86','p21','p0','p94']]

# Generate all possible associations.
associations = (product(l1, l2) for l1, l2 in zip(list1, list2))

# Index for speed and convenience of the lookup.
df = df.set_index(['rid', 'pid']).sort_index()

output = [[df.loc[assoc, 'value'] for assoc in assoc_list if assoc in df.index] 
          for assoc_list in associations]

print(output)

[['chocolate', 'milk'], ['bread']]