Home > database >  Table with most frequent combinations with pandas python
Table with most frequent combinations with pandas python

Time:03-09

The dataset

ID Product
1   A
1   B
2   A 
3   A
3   C 
3   D 
4   A
4   B
5   A
5   C
5   D
.....

The goal is to have the most frequent combinaisons of product by ID regardless the number of string value.

The expected result here is :

[A, C, D]  2
[A, B]     2
[A, C]     2
......

Like that but with a working value

import itertools

(df.groupby('ID').Product.agg(lambda x: list(set(itertools.combinations(x,**?**))))
                 .explode().str.join('-').value_counts())

CodePudding user response:

IIUC, groupby ID, aggregate to frozenset and count the occurrences with value_counts:

df.groupby('ID')['Product'].agg(frozenset).value_counts()

output:

(B, A)       2
(D, C, A)    2
(A)          1
Name: Product, dtype: int64

Alternative using sorted tuples:

df.groupby('ID')['Product'].agg(lambda x: tuple(sorted(x))).value_counts()

output:

(A, B)       2
(A, C, D)    2
(A,)         1
Name: Product, dtype: int64

Or strings:

df.groupby('ID')['Product'].agg(lambda x: ','.join(sorted(x))).value_counts()

output:

A,B      2
A,C,D    2
A        1
Name: Product, dtype: int64
  • Related