Home > front end >  group by and count unique entries with more than one entry in time period
group by and count unique entries with more than one entry in time period

Time:05-13

I have a pandas df as follows:

Date            UserID
2022-01-01       ABC
2022-01-02       ABC
2022-01-03       ABC
2022-01-01       DEF
2022-01-05       DEF 
2022-01-10       GHI

I want to group by year and month and count the number of unique userID's that appear more than one time. So my result would look like:

YEAR   MONTH   COUNT
2022    1       2

Please do note, that since ABC and DEF occurred more than once, they were counted, but since ````GHI``` didn't occur more than once, it was not counted. I tried

df.group_by(['YR','MONTH'].agg({'USERID':'count'})

But this just does a count and nunique counts entries that appear once as well.

I want to count entries that appear more than once only.

CodePudding user response:

Generate year and month columns :

temp =  df.assign(YEAR = df.Date.dt.year, MONTH = df.Date.dt.month)

Filter for only duplicated values:

temp = temp.loc[temp.duplicated(subset = ['YEAR', "MONTH", 'UserID'])]

Groupby by year and month to get unique count:

(temp
.groupby(['YEAR','MONTH'], as_index = False)
.agg(COUNT=('UserID', 'nunique'))
)

   YEAR  MONTH  COUNT
0  2022      1      2

CodePudding user response:

You can do:

s = df.groupby(['Year','Month'])['UserID'].value_counts().gt(1)
s = s.loc[s].groupby(['Year','Month']).value_counts()
s.name = 'COUNT'
result = pd.DataFrame(s).reset_index(level=2,drop=True).reset_index()

Output:

    Year    Month   COUNT
0   2022    1       2
  • Related