Let's say we have the following df:
id | A | B | C | D |
---|---|---|---|---|
123 | 1 | 1 | 0 | 0 |
456 | 0 | 1 | 1 | 0 |
786 | 1 | 0 | 0 | 0 |
The id
column represents a unique client.
Columns A
, B
, C
, and D
represent a product. These columns' values are binary.
1
means the client has that product.
0
means the client doesn't have that product.
I want to create a matrix table of sorts that counts the number of combinations of products that exist for all users.
This would be the desired output, given the df provided above:
A | B | C | D | |
---|---|---|---|---|
A | 2 | 1 | 0 | 0 |
B | 0 | 2 | 1 | 0 |
C | 0 | 1 | 1 | 0 |
D | 0 | 0 | 1 | 0 |
CodePudding user response:
import pandas as pd
df = pd.read_fwf('table.dat', infer_nrows=1001)
cols = ['A', 'B', 'C', 'D']
df2 = df[cols]
df2.T.dot(df2)
Result:
A B C D
A 2 1 0 0
B 1 2 1 0
C 0 1 1 0
D 0 0 0 0
CodePudding user response:
I think you want a dot
product:
df2 = df.set_index('id')
out = df2.T.dot(df2)
Output:
A B C D
A 2 1 0 0
B 1 2 1 0
C 0 1 1 0
D 0 0 0 0