Home > front end >  Count Frequency of all combinations
Count Frequency of all combinations

Time:01-05

I have the following df, for the sake of simplicity, not all rows are shown. The rows are different people how display, which product they would choose.

   Product1 Product2 Product3 Product4 Product5 Product6 Product7 Product8
0    1        0          0       1       0        1         0       0
1    0        1          0       0       0        0         0       1
2    1        0          1       1       1        1         0       0
3    1        0          1       0       1        0         1       0

Right now I do work with the following made up df:

df = pd.DataFrame(np.random.randint(2, size=(566,8)))

At the end I would like to know, how many people choose which combination. I came across this approach create all possible combinations. But that did not clear things up for me.

How would you count all the rows that have the same entries by the columns and show at which columns they have an entry ? could someone guide me into the right direction ?

CodePudding user response:

Use pd.DataFrame.value_counts()

import pandas as pd

df = pd.DataFrame(
    np.random.randint(2, size=(566,8)),
    columns=[f"Product{i}" for i in range(1, 9)]
)

out = df.value_counts()

The output is a series with each of the original rows as its index, and the number of times the corresponding row occurs in the dataframe as its values. You can reset the index to obtain back a dataframe:

out = out.rename("Number of Occurences").reset_index()
     Product1  Product2  Product3  Product4  Product5  Product6  Product7  Product8  Number of Occurences
0           1         0         1         0         1         1         1         0                     9
1           1         0         0         0         0         0         1         1                     6
2           0         0         1         1         1         0         0         0                     6
3           0         0         1         0         0         1         0         1                     6
4           1         1         1         1         1         1         1         1                     5
..        ...       ...       ...       ...       ...       ...       ...       ...                   ...
230         1         0         1         1         1         1         0         1                     1
231         0         1         1         1         0         1         0         1                     1
232         0         1         1         1         0         1         1         1                     1
233         1         0         1         1         1         0         0         1                     1
234         1         0         1         0         1         1         0         1                     1

CodePudding user response:

Label rows with the product combinations, then use value_counts.

df['Combinations'] = df.apply(lambda row: ' '.join(df.columns[row.astype(bool)]), axis=1)

df.Combinations.value_counts()

Product1 Product2 Product6 Product7                      7
Product1 Product2 Product4 Product5 Product6 Product8    7
Product1 Product8                                        7
Product1 Product3 Product5 Product6                      6
Product1 Product3 Product5 Product6 Product7 Product8    6
                                                        ..
Product5 Product6 Product7 Product8                      1
Product4 Product6                                        1
Product1 Product6                                        1
Product1 Product3 Product4 Product6 Product7 Product8    1
Product2 Product5 Product7                               1
  • Related