Background
I've got an events
table, simplified here for brevity:
event_time timestamp with time zone NOT NULL,
user_id character varying(100) NOT NULL,
... more
An example of an event is a user clicking something on a webpage. Many users will create many events. The sample size is just about 100,000 users, some of whom will use the web app once, some of whom might user it frequently, and others might use it in bursts (many times in a 2 week span, then nothing for 2 weeks, then 2 weeks again).
Some example data:
user_id | event_time
1 | 2022-06-20 00:00:00 00
2 | 2022-06-21 00:01:00 00
1 | 2022-06-24 00:00:00 00
1 | 2022-07-01 00:02:34 00
3 | 2022-07-01 00:03:45 00
1 | 2022-07-18 00:00:00 00
3 | 2022-07-19 01:00:00 00
Question
How would I write a query to identify the frequency of user_id
s that show up in many weeks?
Examples
The query would ideally return the count of user_id
s that show up at least once in a week.
one_occurrence | two_occurrences | three_occurrences | four_occurrences | more_than_four
1 | 1 | 1 | 0 | 0
ID 1
has four events but only three occurrences because:
1 - they clicked the page twice within the week of 6/20
1 - they clicked the page once the week of 7/01
1 - and finally, they clicked the page once the week of 7/18
= 3 weeks where user_id `1` clicked at least once.
Lastly, anything greater than 4 would be grouped together.
It's the same for the other two user_ids, but those are more straightforward.
CodePudding user response:
This is just assuming that you want a Sunday-based week. If you wanted a Monday then just use date_trunc()
without the adjustment.
It's a simple matter of grouping and pivoting once you calculate the week start and then counting the total number of such groups per user.
with data as (
select distinct user_id, count(*) over (partition by user_id) as week_count
from events
group by user_id, date_trunc('week', event_time::date 1)::date - 1;
)
select
count(case when week_count = 1 then 1 end) as one_occurrence,
count(case when week_count = 2 then 1 end) as two_occurrence,
count(case when week_count = 3 then 1 end) as three_occurrence,
count(case when week_count = 4 then 1 end) as four_occurrence,
count(case when week_count > 4 then 1 end) as more_than_four
from data