Home > Software engineering >  How to add rows into table by condition in ClickHouse?
How to add rows into table by condition in ClickHouse?

Time:12-22

I have a table in ClickHouse with events of connecting and disconnecting to system. Query select timestamp, username, event from table gives following result.

timestamp username event
December 20, 2022, 18:24 1 Connect
December 20, 2022, 18:30 1 Disconnect
December 20, 2022, 18:34 1 Connect
December 21, 2022, 12:07 1 Disconnect
December 20, 2022, 12:15 2 Connect
December 20, 2022, 12:47 2 Disconnect

The session must be show in table as finished by the end of the day. If user was connected to system on 20th December and had no "Disconnect" after that in the same day, I have to add such 'Disconnect' event at the table with some query. And also I have to add row with event of 'Connect' to the table at 00:00 of next day. For example, in sample table you can see that user #1 had not finished session on 20th December, so I want to have the following result:

timestamp username event
December 20, 2022, 18:24 1 Connect
December 20, 2022, 18:30 1 Disconnect
December 20, 2022, 18:34 1 Connect
December 20, 2022, 23:59 1 Disconnect
December 21, 2022, 00:00 1 Connect
December 21, 2022, 12:07 1 Disconnect
December 20, 2022, 12:15 2 Connect
December 20, 2022, 12:47 2 Disconnect

Is there any way to amend the query so it will work as I described above? ClickHouse is not so common as Posrgres or SQL Server, as far as I know, so code in Postgres dialect will be fine, I will find out how to make the same with ClickHouse.

CodePudding user response:

First identify these Disconnect events that need to be preceded by a Disconnect/Connect pair. This is t CTE, overnight attribute. Then insert a Disconnect/Connect pair into the_table for every record of t with overnight true.

with t as
(
 select *, 
   "timestamp"::date > lag("timestamp") over (partition by username order by "timestamp")::date overnight 
 from the_table
 where "event" = 'Disconnect'
)
insert into the_table ("timestamp", "username", "event")
 select ts, un, ev 
 from t 
 cross join lateral
 (
  values (date_trunc('day', "timestamp") - interval '1 second', "username", 'Disconnect'),
         (date_trunc('day', "timestamp"), "username", 'Connect')
 ) as l(ts, un, ev)
 where overnight; 

DB-fiddle demo

CodePudding user response:

You do not need lateral join to achieve desired result in Clickhouse (JOINs in Clickhouse are always compute heavy operations since it is a columnar store. ClickHouse takes the right table and creates a hash table for it in RAM).

You can use UNION ALL and ARRAY JOIN in specific way to generate missing rows:

CREATE TABLE connections
(
    `timestamp` DateTime,
    `username` LowCardinality(String),
    `event` enum('Connect', 'Disconnect')
)
ENGINE = Memory;

INSERT INTO connections VALUES
('2022-12-20 18:24:00'::DateTime,   '1',    'Connect')
('2022-12-20 18:30:00'::DateTime,   '1',    'Disconnect')
('2022-12-20 18:34:00'::DateTime,   '1',    'Connect')
('2022-12-21 12:07:00'::DateTime,   '1',    'Disconnect')
('2022-12-20 12:15:00'::DateTime,   '2',    'Connect')
('2022-12-20 12:47:00'::DateTime,   '2',    'Disconnect');

SELECT * FROM 
  (
    SELECT 
      timestamp, username, event 
    FROM 
      connections 
    UNION ALL 
    SELECT 
      timestamp, username, event 
    FROM 
      (
        SELECT 
          [toStartOfDay(timestamp)   INTERVAL '1 DAY' - INTERVAL '1 SECOND', 
          toStartOfDay(timestamp)   INTERVAL '1 DAY' ] timestamps, 
          username, 
          [ 'Disconnect', 'Connect' ] :: Array(Enum('Connect', 'Disconnect')) events 
        FROM 
          connections 
        GROUP BY 
          toStartOfDay(timestamp), username 
        HAVING 
          anyLast(event) = 'Connect'
      ) ARRAY JOIN 
        timestamps AS timestamp, 
        events AS event
  ) 
ORDER BY 
  username, timestamp

Here is the result:

┌───────────timestamp─┬─username─┬─event──────┐
│ 2022-12-20 18:24:00 │ 1        │ Connect    │
│ 2022-12-20 18:30:00 │ 1        │ Disconnect │
│ 2022-12-20 18:34:00 │ 1        │ Connect    │
│ 2022-12-20 23:59:59 │ 1        │ Disconnect │
│ 2022-12-21 00:00:00 │ 1        │ Connect    │
│ 2022-12-21 12:07:00 │ 1        │ Disconnect │
│ 2022-12-20 12:15:00 │ 2        │ Connect    │
│ 2022-12-20 12:47:00 │ 2        │ Disconnect │
└─────────────────────┴──────────┴────────────┘

8 rows in set. Elapsed: 0.011 sec.
  • Related