I have a table with sessions and want to count the session that the userID goes through.
1 session and 1 count of sessions.
date | userID | session |
---|---|---|
2022-01-01 | 1 | |
2022-01-02 | 1 | |
2022-01-03 | 1 | |
2022-01-03 | 1 | |
2022-01-01 | 2 | |
2022-01-02 | 2 | |
2022-01-03 | 2 | tiktok |
2022-01-03 | 3 | tiktok |
output is
session | count |
---|---|
tiktok | 1 |
facebook google | 1 |
facebook google tiktok | 1 |
So the count is based on a unique user with the session that the user id has been exposed to.
CodePudding user response:
IIUC, you want to find the unique combinations of sites, without duplicates (but it remains unclear whether order is important).
In order (A->B is different than B->A):
(df.groupby('userID')['session']
.agg(lambda x: tuple(dict.fromkeys(x)))
.value_counts()
)
output:
(facebook, google) 1
(facebook, google, tiktok) 1
(tiktok,) 1
Name: session, dtype: int64
or, to illustrate the order:
(df.groupby('userID')['session']
.agg(lambda x: '->'.join(dict.fromkeys(x)))
.value_counts()
)
output:
facebook->google 1
facebook->google->tiktok 1
tiktok 1
Name: session, dtype: int64
Order independent (A->B would count as B->A):
(df.groupby('userID')['session']
.agg(frozenset)
.value_counts()
)
output:
(facebook, google) 1
(facebook, tiktok, google) 1
(tiktok) 1
Name: session, dtype: int64