Home > Mobile >  Count combinations of values using pandas crosstab
Count combinations of values using pandas crosstab

Time:07-08

I would like to count the combinations of values (pets) per group (user). The code below gives the desired result. However, I'm looking for a more 'pandamic' way, maybe by using the crosstab method. Any suggestions for a less verbose solution?

import pandas as pd
import numpy as np
import itertools
df1 = pd.DataFrame({'user':['Jane', 'Matthew', 'Emily'], 'pets':[['dog', 'cat', 'lizard'], ['dog', 'spider'], ['dog', 'cat', 'monkey']]}).explode('pets')
combinations = []
for g in df1.groupby('user'): combinations  = [x for x in itertools.combinations(g[1].pets, 2)]
df2 = pd.DataFrame(np.zeros((df1.pets.nunique(), df1.pets.nunique()), dtype=int), columns=df1.pets.unique(), index=df1.pets.unique())
for x in combinations:
    df2.at[x[0], x[1]]  = 1
    df2.at[x[1], x[0]]  = 1
print(df2)

Result:

        dog  cat  lizard  spider  monkey
dog       0    2       1       1       1
cat       2    0       1       0       1
lizard    1    1       0       0       0
spider    1    0       0       0       0
monkey    1    1       0       0       0

CodePudding user response:

Use DataFrame.merge with crosstab:

df = df1.merge(df1, on='user')
df = pd.crosstab(df.pets_x, df.pets_y).rename_axis(index=None, columns=None)
print(df)

        cat  dog  lizard  monkey  spider
cat       2    2       1       1       0
dog       2    3       1       1       1
lizard    1    1       1       0       0
monkey    1    1       0       1       0
spider    0    1       0       0       1

If need set values in diagonal to 0 add numpy.fill_diagonal:

df = df1.merge(df1, on='user')

df = pd.crosstab(df.pets_x, df.pets_y).rename_axis(index=None, columns=None)
np.fill_diagonal(df.to_numpy(), 0)
print (df)
        cat  dog  lizard  monkey  spider
cat       0    2       1       1       0
dog       2    0       1       1       1
lizard    1    1       0       0       0
monkey    1    1       0       0       0
spider    0    1       0       0       0

import itertools
combinations = []
for g in df1.groupby('user'): combinations  = [x for x in itertools.combinations(g[1].pets, 2)]
df2 = pd.DataFrame(np.zeros((df1.pets.nunique(), df1.pets.nunique()), dtype=int), columns=df1.pets.unique(), index=df1.pets.unique())
for x in combinations:
    df2.at[x[0], x[1]]  = 1
    df2.at[x[1], x[0]]  = 1
print(df2)
        dog  cat  lizard  spider  monkey
dog       0    2       1       1       1
cat       2    0       1       0       1
lizard    1    1       0       0       0
spider    1    0       0       0       0
monkey    1    1       0       0       0
  • Related