I have a pandas dataframe as below with 100 columns like this:
Product A Product B Product C ....
1 1 1
1 0 1
0 1 1
0 0 0
1 1 1
0 0 0
I want to see how many Customers who buys Product A also buys product B, so on for e.g. out of 3 customers who buys product A , 2 buys product B, so the result is calculated as 2/3 = 0.6667
B
No Yes
A Yes 1 2 - Result 0.666666667
Similarly, we will do for A - C
C
No Yes
A Yes 0 3 - Result 1
My Expected Output is
Combination Result
A-B 0.66
A-C 1
B-A ..
B-C ..
C-A ..
C-B ..
I am using pandas.groupby()
and looping over different columns. Is there's an efficient way to achieve this ?
CodePudding user response:
Use @
for matrix multiplication, which computes the co-occurrence for you:
(df.T @ df)/df.sum()
Output:
Product A Product B Product C
Product A 1.000000 0.666667 0.75
Product B 0.666667 1.000000 0.75
Product C 1.000000 1.000000 1.00