I have a dataset:
list1 list2
0 [1,3,4] [4,3,2]
1 [1,3,2] [0,4,6]
2 [4,5,8] NA
3 [6,3,7] [8,2,3]
Is there a process where i can find the count of the common term for- each of the index,
Expected output: intersection_0, it will compare 0 of list1 with each of list2 and give output, intersection_1 which will compare 1 of list1 with each of list2
Expected_output:
Intersection_0 intersection_1 intersection_2 intersection_3
1 2 1 1
1 0 1 1
0 0 0 0
1 2 0 1
For intersection i was trying:
df['intersection'] = [len(set(a).intersection(b)) for a, b in zip(df1.list1, df1.list2)]
Is there a better way or faster way to achieve this? Thank you in advance
CodePudding user response:
The double loop would go like this:
intersections = []
for l2 in df['list2']:
intersection = []
for l1 in df['list1']:
try:
i = len(np.intersect1d(l1,l2))
except:
i = 0
intersection.append(i)
intersections.append(intersection)
out = (pd.DataFrame(intersections))
Output:
0 1 2 3
0 2 2 1 1
1 1 0 1 1
2 0 0 0 0
3 1 2 1 1