I have Id, event_name, Timestamp columns.
I will concatenate last 5 event of every id.
I've used multiple WITH and JOIN to acquaire this. But BigQuery took so much time to compute.
Also this feels like such a bad practice. What can i use as an alternative?
My table looks something like this
ID | event_name | timestamp |
---|---|---|
A1 | a | 2022-10-21 12:10:00 UTC |
A1 | b | 2022-10-21 12:12:00 UTC |
A1 | c | 2022-10-21 12:15:00 UTC |
A1 | d | 2022-10-21 12:16:00 UTC |
A1 | e | 2022-10-21 12:28:00 UTC |
A1 | f | 2022-10-21 12:45:00 UTC |
B2 | c | 2022-10-21 10:12:00 UTC |
B2 | f | 2022-10-21 11:12:00 UTC |
B2 | b | 2022-10-21 11:25:00 UTC |
B2 | a | 2022-10-21 11:26:00 UTC |
B2 | f | 2022-10-21 15:32:00 UTC |
B2 | c | 2022-10-21 15:32:48 UTC |
B2 | f | 2022-10-21 15:36:00 UTC |
My code looks like this.
WITH a AS ( id, timestamp, event_name, row_number() over(partition by ID ORDER BY timestamp DESC) as row_n
FROM my_table),
WITH b AS (id, timestamp, event_name, row_n
FROM a
WHERE row_n <= 5),
e1 AS(
SELECT ID, timestamp AS ev1
FROM b
WHERE row_n = 1),
e2 AS(
SELECT ID, timestamp AS ev2
FROM b
WHERE row_n = 2),
e3 AS(
SELECT ID, timestamp AS ev3
FROM b
WHERE row_n = 3),
e4 AS(
SELECT ID, timestamp AS ev4
FROM b
WHERE row_n = 4),
e5 AS(
SELECT ID, timestamp AS ev5
FROM b
WHERE row_n = 5),
concat_prep AS(
SELECT b.ID, ev1,ev2,ev3,ev4,ev5
FROM b
LEFT JOIN e1
ON b.ID = e1.ID
LEFT JOIN e2
ON e1.ID = e2.ID
LEFT JOIN e3
ON e2.ID = e3.ID
LEFT JOIN e4
ON e3.ID = e4.ID
LEFT JOIN e5
ON e4.ID= e5.ID)
SELECT ID, concat(ev1,',',ev2,',',ev3,',',ev4,',',ev5) as concatt
FROM concat_prep
GROUP BY ID ,concat(ev1,',',ev2,',',ev3,',',ev4,',',ev5)
And my output should look like this:
ID | concat |
---|---|
A1 | f,e,d,c,b |
B2 | f,c,f,a,b |
How can I optimize it? (I've already filtered by date) this query is part of a bigger query.
CodePudding user response:
Please build an array in a group by
. There is an option to limit the elements.
With tbl as (select * , rand() as timestamp from unnest(["A1","B2"]) ID,unnest(split("a b c d e f g"," ")) event_name)
SELECT
ID,
#array_Agg(event_name order by timestamp limit 5),
string_Agg(event_name order by timestamp desc limit 5) as concat
from tbl
group by 1
CodePudding user response:
Yet, you can use ROW_NUMBER()
window function while DENSE_RANK()
might rather be preferable as it arbitrarily would render including the ties(equality) of timestamp values.
So an option would be using :
WITH t AS
(
SELECT t.*, DENSE_RANK() OVER (PARTITION BY id ORDER BY timestamp DESC) AS dr
FROM my_table AS t
)
SELECT ID, STRING_AGG(event_name ORDER BY dr LIMIT 5) AS `concat`
FROM t
GROUP BY ID
where a comma separated string would be generated by default(unless and optional argument delimiter provided after expression [the column event_name in this case] ) by using STRING_AGG() concatenation function