The dataset
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
5 A
5 C
5 D
.....
The goal is to have the most frequent combinaisons of product by ID regardless the number of string value.
The expected result here is :
[A, C, D] 2
[A, B] 2
[A, C] 2
......
Like that but with a working value
import itertools
(df.groupby('ID').Product.agg(lambda x: list(set(itertools.combinations(x,**?**))))
.explode().str.join('-').value_counts())
CodePudding user response:
IIUC, groupby
ID, aggregate to frozenset
and count the occurrences with value_counts
:
df.groupby('ID')['Product'].agg(frozenset).value_counts()
output:
(B, A) 2
(D, C, A) 2
(A) 1
Name: Product, dtype: int64
Alternative using sorted tuples:
df.groupby('ID')['Product'].agg(lambda x: tuple(sorted(x))).value_counts()
output:
(A, B) 2
(A, C, D) 2
(A,) 1
Name: Product, dtype: int64
Or strings:
df.groupby('ID')['Product'].agg(lambda x: ','.join(sorted(x))).value_counts()
output:
A,B 2
A,C,D 2
A 1
Name: Product, dtype: int64