Home > OS >  Count in time window for each present day in Clickhouse
Count in time window for each present day in Clickhouse

Time:01-11

I have a table with logs of users who used certain service. Something like the table below, each row is timestamp of activity and user id.

user_id timestamp
831 2022-06-22 04:37:10
789 2022-06-22 12:38:57
831 2022-06-22 16:40:10

I want to calculate number of unique users in each day, but not just in that day, but include a week prior. Basically, moving window unique count: for day "x" count should be in window of "x-7 days":"x".

As I see in docs,

INTERVAL syntax for DateTime RANGE OFFSET frame: not supported, specify the number of seconds instead (RANGE works with any numeric type).

easy way of using interval with passing something like RANGE INTERVAL 7 day PRECEDING is not supported, and they suggest to use range with passing seconds, but I don't really have experience with range in sql, so I don't really get how do you pass seconds there. My current code:

with cleaned_table as (
  select
    user_id,
    date_trunc('day', timestamp) as day
  from
    table
)
SELECT
  day,
  uniqExact(user_id) OVER (
    PARTITION by day ORDER BY day range ???
  )
FROM
  cleaned_table

Also, ideally, I have a feeling that I should add group by somewhere since I need only one row per each day, not a row for each row in initial table, and without grouping I'm doing recalculation(?) for each row instead of calculating for each day once.

CodePudding user response:

I'll mark this as an answer, but I'd be happy if anyone knows how to optimize this solution by incorporating group by or other methods to not recalculate window function for each row and calculate it only once for each day.

Anyway, RANGE BETWEEN 6 PRECEDING and current row is what I was searching for, worked just fine. Also added ::date to cast timestamp into date dtype, and DISTINCT day allows directly picking only one row for each day instead of running group by with any one more time.

with cleaned_table as (
  select
    user_id,
    date_trunc('day', timestamp)::date as day
  from
    table
)

SELECT
  DISTINCT day,
  uniqExact(user_id) OVER (
    ORDER BY
      day ASC RANGE BETWEEN 6 PRECEDING
      and current row
  ) as users
FROM
  cleaned_table

CodePudding user response:

create table t(user_id Int64, timestamp DateTime) Engine = Memory as select * from values((831, '2022-06-22 04:37:10'), (789,'2022-06-22 12:38:57'), (831,'2022-06-22 16:40:10'), (1,'2022-06-21 12:38:57'), (2,'2022-06-20 16:40:10'));


SELECT
    day,
    finalizeAggregation(u) AS uniqByDay,
    uniqMerge(u) OVER (ORDER BY day ASC RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS uniqBy6Days
FROM
(
    SELECT
        toDate(timestamp) AS day,
        uniqState(user_id) AS u
    FROM t
    GROUP BY day
)
ORDER BY day ASC

┌────────day─┬─uniqByDay─┬─uniqBy6Days─┐
│ 2022-06-20 │         1 │           1 │
│ 2022-06-21 │         1 │           2 │
│ 2022-06-22 │         2 │           4 │
└────────────┴───────────┴─────────────┘

see How to obtain p95 of a day and p95 of the last 7 days of that day from Clickhouse through an SQL query?

  • Related