Home > Blockchain >  How to count the combinations of unique values per group in pandas?
How to count the combinations of unique values per group in pandas?

Time:04-12

I have a table with sessions and want to count the session that the userID goes through.

1 session and 1 count of sessions.

date userID session
2022-01-01 1 facebook
2022-01-02 1 google
2022-01-03 1 facebook
2022-01-03 1 google
2022-01-01 2 facebook
2022-01-02 2 google
2022-01-03 2 tiktok
2022-01-03 3 tiktok

output is

session count
tiktok 1
facebook google 1
facebook google tiktok 1

So the count is based on a unique user with the session that the user id has been exposed to.

CodePudding user response:

IIUC, you want to find the unique combinations of sites, without duplicates (but it remains unclear whether order is important).

In order (A->B is different than B->A):
(df.groupby('userID')['session']
   .agg(lambda x: tuple(dict.fromkeys(x)))
   .value_counts()
)

output:

(facebook, google)            1
(facebook, google, tiktok)    1
(tiktok,)                     1
Name: session, dtype: int64

or, to illustrate the order:

(df.groupby('userID')['session']
   .agg(lambda x: '->'.join(dict.fromkeys(x)))
   .value_counts()
)

output:

facebook->google            1
facebook->google->tiktok    1
tiktok                      1
Name: session, dtype: int64
Order independent (A->B would count as B->A):
(df.groupby('userID')['session']
   .agg(frozenset)
   .value_counts()
)

output:

(facebook, google)            1
(facebook, tiktok, google)    1
(tiktok)                      1
Name: session, dtype: int64
  • Related