I have a pandas df as follows:
Date UserID
2022-01-01 ABC
2022-01-02 ABC
2022-01-03 ABC
2022-01-01 DEF
2022-01-05 DEF
2022-01-10 GHI
I want to group by year
and month
and count the number of unique
userID's that appear more than one time.
So my result would look like:
YEAR MONTH COUNT
2022 1 2
Please do note, that since ABC
and DEF
occurred more than once, they were counted, but since ````GHI``` didn't occur more than once, it was not counted.
I tried
df.group_by(['YR','MONTH'].agg({'USERID':'count'})
But this just does a count and nunique
counts entries that appear once as well.
I want to count entries that appear more than once only.
CodePudding user response:
Generate year and month columns :
temp = df.assign(YEAR = df.Date.dt.year, MONTH = df.Date.dt.month)
Filter for only duplicated values:
temp = temp.loc[temp.duplicated(subset = ['YEAR', "MONTH", 'UserID'])]
Groupby by year and month to get unique count:
(temp
.groupby(['YEAR','MONTH'], as_index = False)
.agg(COUNT=('UserID', 'nunique'))
)
YEAR MONTH COUNT
0 2022 1 2
CodePudding user response:
You can do:
s = df.groupby(['Year','Month'])['UserID'].value_counts().gt(1)
s = s.loc[s].groupby(['Year','Month']).value_counts()
s.name = 'COUNT'
result = pd.DataFrame(s).reset_index(level=2,drop=True).reset_index()
Output:
Year Month COUNT
0 2022 1 2