Home > Net >  Compare multiple columns with each other to calculate association
Compare multiple columns with each other to calculate association

Time:10-18

I have a pandas dataframe as below with 100 columns like this:

Product A   Product B   Product C ....
1               1               1
1               0               1
0               1               1
0               0               0
1               1               1
0               0               0

I want to see how many Customers who buys Product A also buys product B, so on for e.g. out of 3 customers who buys product A , 2 buys product B, so the result is calculated as 2/3 = 0.6667

          B 
        No  Yes 
A   Yes 1   2    - Result 0.666666667

Similarly, we will do for A - C

          C
        No  Yes 
A   Yes 0   3    - Result 1

My Expected Output is

Combination    Result               
A-B             0.66            
A-C             1           
B-A             ..
B-C             ..
C-A             ..
C-B             ..

I am using pandas.groupby() and looping over different columns. Is there's an efficient way to achieve this ?

CodePudding user response:

Use @ for matrix multiplication, which computes the co-occurrence for you:

(df.T @ df)/df.sum()

Output:

           Product A  Product B  Product C
Product A   1.000000   0.666667       0.75
Product B   0.666667   1.000000       0.75
Product C   1.000000   1.000000       1.00
  • Related