Home > Blockchain >  Count combination of values in pandas dataframe
Count combination of values in pandas dataframe

Time:08-24

Let's say we have the following df:

id A B C D
123 1 1 0 0
456 0 1 1 0
786 1 0 0 0

The id column represents a unique client.

Columns A, B, C, and D represent a product. These columns' values are binary.

1 means the client has that product. 0 means the client doesn't have that product.

I want to create a matrix table of sorts that counts the number of combinations of products that exist for all users.

This would be the desired output, given the df provided above:

A B C D
A 2 1 0 0
B 0 2 1 0
C 0 1 1 0
D 0 0 1 0

CodePudding user response:

import pandas as pd

df = pd.read_fwf('table.dat', infer_nrows=1001)
cols = ['A', 'B', 'C', 'D']
df2 = df[cols]
df2.T.dot(df2)

Result:

    A   B   C   D
A   2   1   0   0
B   1   2   1   0
C   0   1   1   0
D   0   0   0   0

CodePudding user response:

I think you want a dot product:

df2 = df.set_index('id')

out = df2.T.dot(df2)

Output:

   A  B  C  D
A  2  1  0  0
B  1  2  1  0
C  0  1  1  0
D  0  0  0  0
  • Related