Home > Software engineering >  How to select data with percentages from Clickhouse?
How to select data with percentages from Clickhouse?

Time:12-11

Given the following table:

CREATE TABLE main
(
    `job_id` UUID,
    `request_time` DateTime,
    `host_id` UInt8,
    `status_code` LowCardinality(String),
)
ENGINE = MergeTree
ORDER BY request_time
SETTINGS index_granularity = 8192

I'm trying to fetch all statuses by host, with their relative percentage. In order to do that, I need to count results grouped by host and status, and divide each counted field by its total host fields count.

For example, this query would have worked in MySQL:

SELECT
    main.host_id,
    main.status_code,
    COUNT() AS status_count,
    COUNT() / sub.host_total * 100 AS percent
FROM
    main
INNER JOIN (
    SELECT host_id, COUNT() AS host_total
    FROM main
    GROUP BY host_id
) AS sub ON (sub.host_id = main.host_id)
GROUP BY
    main.host_id,
    main.status_code

But ClickHouse throws:

DB::Exception: Unknown identifier: host_total; there are columns: host_id, status_code, count(): While processing host_id, status_code, count() AS status_count, (count() / host_total) * 100 AS percent. (UNKNOWN_IDENTIFIER)

Probably because correlated (dependent) subqueries are not supported.

I was suggested to use CTE, so I tried this:

WITH sub AS (
    SELECT host_id, COUNT() AS host_total
    FROM main
    GROUP BY host_id
)
SELECT
    main.host_id,
    main.status_code,
    COUNT() AS status_count,
    COUNT() / (SELECT host_total FROM sub WHERE sub.host_id = main.host_id) * 100 AS percent
FROM
    main
GROUP BY
    main.host_id,
    main.status_code

But still no luck:

DB::Exception: Missing columns: 'main.host_id' while processing query: 'SELECT host_total FROM sub WHERE host_id = main.host_id', required columns: 'host_total' 'host_id' 'main.host_id' 'host_total' 'host_id' 'main.host_id': While processing (SELECT host_total FROM sub WHERE sub.host_id = main.host_id) AS _subquery20: While processing count() / ((SELECT host_total FROM sub WHERE sub.host_id = main.host_id) AS _subquery20): While processing (count() / ((SELECT host_total FROM sub WHERE sub.host_id = main.host_id) AS _subquery20)) * 100 AS percent. (UNKNOWN_IDENTIFIER)

CodePudding user response:

CH throws an incorrect error. https://github.com/ClickHouse/ClickHouse/issues/4567

host_total should be in groupby section or under aggregate function

insert into main(request_time, host_id,status_code) values ( now(), 1, 200);
insert into main(request_time, host_id,status_code) values ( now(), 1, 500);

SELECT
    main.host_id,
    main.status_code,
    COUNT() AS status_count,
    COUNT() / any(sub.host_total) * 100 AS percent
FROM
    main
INNER JOIN (
    SELECT host_id, COUNT() AS host_total
    FROM main
    GROUP BY host_id
) AS sub ON (sub.host_id = main.host_id)
GROUP BY
    main.host_id,
    main.status_code

┌─host_id─┬─status_code─┬─status_count─┬─percent─┐
│       1 │ 200         │            1 │      50 │
│       1 │ 500         │            1 │      50 │
└─────────┴─────────────┴──────────────┴─────────┘

But there is better ways to solve it:

window functions

select   host_id, status_code, status_count, status_count / host_total  * 100 AS percent
from (
SELECT
    host_id,
    status_code,
    COUNT() AS status_count,
    COUNT() over (partition by host_id) host_total
from main    
GROUP BY
    host_id,
    status_code
);
┌─host_id─┬─status_code─┬─status_count─┬─percent─┐
│       1 │ 200         │            1 │      50 │
│       1 │ 500         │            1 │      50 │
└─────────┴─────────────┴──────────────┴─────────┘

arrays

select host_id, status_code, status_count, status_count / host_total  * 100 AS percent
from (
SELECT
    host_id,
    sumMap([status_code], [1]) ga,
    count() host_total
from main    
GROUP BY host_id
) array join ga.1 as status_code, ga.2 as status_count

┌─host_id─┬─status_code─┬─status_count─┬─percent─┐
│       1 │ 200         │            1 │      50 │
│       1 │ 500         │            1 │      50 │
└─────────┴─────────────┴──────────────┴─────────┘

  • Related