I have a table with logs of users who used certain service. Something like the table below, each row is timestamp of activity and user id.
user_id | timestamp |
---|---|
831 | 2022-06-22 04:37:10 |
789 | 2022-06-22 12:38:57 |
831 | 2022-06-22 16:40:10 |
I want to calculate number of unique users in each day, but not just in that day, but include a week prior. Basically, moving window unique count: for day "x" count should be in window of "x-7 days":"x".
As I see in docs,
INTERVAL syntax for DateTime RANGE OFFSET frame: not supported, specify the number of seconds instead (RANGE works with any numeric type).
easy way of using interval with passing something like RANGE INTERVAL 7 day PRECEDING
is not supported, and they suggest to use range with passing seconds, but I don't really have experience with range in sql, so I don't really get how do you pass seconds there. My current code:
with cleaned_table as (
select
user_id,
date_trunc('day', timestamp) as day
from
table
)
SELECT
day,
uniqExact(user_id) OVER (
PARTITION by day ORDER BY day range ???
)
FROM
cleaned_table
Also, ideally, I have a feeling that I should add group by
somewhere since I need only one row per each day, not a row for each row in initial table, and without grouping I'm doing recalculation(?) for each row instead of calculating for each day once.
CodePudding user response:
I'll mark this as an answer, but I'd be happy if anyone knows how to optimize this solution by incorporating group by
or other methods to not recalculate window function for each row and calculate it only once for each day.
Anyway, RANGE BETWEEN 6 PRECEDING and current row
is what I was searching for, worked just fine. Also added ::date
to cast timestamp into date dtype, and DISTINCT day
allows directly picking only one row for each day instead of running group by
with any
one more time.
with cleaned_table as (
select
user_id,
date_trunc('day', timestamp)::date as day
from
table
)
SELECT
DISTINCT day,
uniqExact(user_id) OVER (
ORDER BY
day ASC RANGE BETWEEN 6 PRECEDING
and current row
) as users
FROM
cleaned_table
CodePudding user response:
create table t(user_id Int64, timestamp DateTime) Engine = Memory as select * from values((831, '2022-06-22 04:37:10'), (789,'2022-06-22 12:38:57'), (831,'2022-06-22 16:40:10'), (1,'2022-06-21 12:38:57'), (2,'2022-06-20 16:40:10'));
SELECT
day,
finalizeAggregation(u) AS uniqByDay,
uniqMerge(u) OVER (ORDER BY day ASC RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS uniqBy6Days
FROM
(
SELECT
toDate(timestamp) AS day,
uniqState(user_id) AS u
FROM t
GROUP BY day
)
ORDER BY day ASC
┌────────day─┬─uniqByDay─┬─uniqBy6Days─┐
│ 2022-06-20 │ 1 │ 1 │
│ 2022-06-21 │ 1 │ 2 │
│ 2022-06-22 │ 2 │ 4 │
└────────────┴───────────┴─────────────┘