Joining large tables in ClickHouse: out of memory or slow-CodePudding

I have 3 large tables (>100 GB with millions of rows each): events, page_views, and sessions. These tables are connected via 1-n relationships, see table setup below. I'm trying to create a denormalized events_wide table that contains a row for each event, where the corresponding page_views and sessions columns are included as well. The idea is to eliminate the joins needed for complex analytics queries, since these joins are slow.

I created a materialized view events_mv which joins the page_views and sessions table to the events table. Whenever a new event is inserted into events, the materialized view should insert a row into events_wide, joining the page_view and session automatically. However, when I insert a single new event, the query either doesn't finish or terminates with an out of memory error.

Even running this simple join query from events to page_views results in an out of memory error: Memory limit (for user) exceeded: would use 99.21 GiB. I use a ClickHouse Cloud production instance with 24 GB RAM:

SELECT
    -- Select columns from events and page_views
FROM events AS e
LEFT JOIN page_views AS p ON p.property_id = e.property_id AND p.id = e.page_view_id
LIMIT 3;

I tried different primary key orderings for the 3 tables (property_id, created_at, id) vs (property_id, id, created_at), different join algorithms (partial_merge, auto, grace_hash), ANY LEFT JOIN, without success. Maybe using UUIDs instead of numeric IDs is part of the problem, but I can't change the UUIDs unfortunately.

This is my table setup with the (property_id, id, created_at) primary keys:

CREATE TABLE events
(
    id UUID,
    created_at DateTime('UTC'),
    property_id Int,
    page_view_id Nullable(UUID),
    session_id Nullable(UUID),
    ...
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(created_at)
PRIMARY KEY (property_id, id, created_at)
ORDER BY (property_id, id, created_at);

CREATE TABLE page_views
(
    id UUID,
    created_at DateTime('UTC'),
    modified_at DateTime('UTC'),
    session_id Nullable(UUID),
    ...
) ENGINE = ReplacingMergeTree(modified_at)
PARTITION BY toYYYYMM(created_at)
PRIMARY KEY (property_id, id, created_at)
ORDER BY (property_id, id, created_at);

CREATE TABLE sessions
(
    id UUID,
    created_at DateTime('UTC'),
    modified_at DateTime('UTC'),
    property_id Int,
    ...
) ENGINE = ReplacingMergeTree(modified_at)
PARTITION BY toYYYYMM(created_at)
PRIMARY KEY (property_id, id, created_at)
ORDER BY (property_id, id, created_at);


CREATE TABLE events_wide
(
    id UUID,
    created_at DateTime('UTC'),
    property_id Int,
    page_view_id Nullable(UUID),
    session_id Nullable(UUID),
    ...
    -- page_views columns
    p_created_at DateTime('UTC'),
    p_modified_at DateTime('UTC'),
    ...
    -- sessions columns
    s_created_at DateTime('UTC'),
    s_modified_at DateTime('UTC'),
    ...
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
PRIMARY KEY (property_id, created_at)
ORDER BY (property_id, created_at, id);


CREATE MATERIALIZED VIEW events_mv TO events_wide AS
SELECT
    e.id AS id,
    e.created_at AS created_at,
    e.session_id AS session_id,
    e.property_id AS property_id,
    e.page_view_id AS page_view_id,
    ...
    -- page_views columns
    p.created_at AS p_created_at,
    p.modified_at AS p_modified_at,
    ...
    -- sessions columns
    s.created_at AS s_created_at,
    s.modified_at AS s_modified_at ,
    ...
FROM events AS e
LEFT JOIN page_views AS p ON p.property_id = e.property_id AND p.id = e.page_view_id
LEFT JOIN sessions AS s ON s.property_id = e.property_id AND s.id = e.session_id
SETTINGS join_algorithm = 'partial_merge';

CodePudding user response：

To avoid expensive joins in this case, what about having the single denormalized super table "events_wide" with all the columns but use three different materialized views?

Each Materialized View would insert columns for each table to the "events_wide" and the columns not present with zeros or nulls.

For instance:

CREATE MATERIALIZED VIEW events_to_events_wide TO events_wide AS
SELECT
    e.id AS id,
    e.created_at AS created_at,
    e.session_id AS session_id,
    e.property_id AS property_id,
    e.page_view_id AS page_view_id,
    ...
    -- page_views columns as null or zeros
    
    ...
    -- sessions columns as null or zeros
    
    ...
FROM events AS e

CREATE MATERIALIZED VIEW page_views_to_events_wide TO events_wide AS
SELECT
    e.id AS id,
    e.created_at AS created_at,
    e.modified_at AS modified_at,
    ...
    -- events columns as null or zeros
    
    ...
    -- sessions columns as null or zeros
    
    ...
FROM page_views AS e

...

Then you have a single table with all records you can aggregate or perform the analysis you need without joins.

CodePudding user response：

In order to keep the memory footprint manageable, you can try breaking the JOIN process down into two join operations. You can achieve it by chaining materialized views:

First join events with page_view in a dedicated MV, let's say events_with_pv
Then join the events_with_pv with sessions into the final MV events_wide

You can use join_algorithm='auto' and let ClickHouse automatically decide which algorithm to use. Materializing the intermediate state will allow to reduce the size of the state needed to compute the JOIN. Something to keep in mind in that you want the right table of your JOINs to be the smallest.