I have the following table <state_table> that tracks entries per minute of an id and its state:
minute id type
------ -- ----
1 A solid
1 A solid
1 A solid
1 A liquid
1 B solid
1 B solid
1 B liquid
.... 1000 rows ...
1 ZZX liquid
1 ZZZ liquid
2 A solid
2 A solid
2 A liquid
With the following query, I can get the top 1000 pairs based on occurrence:
With TempIds AS (
SELECT
state_table.minute as minute,
state_table.id as id,
COUNT(*)
FROM
state_table
GROUP BY 1,2
) SELECT
TempId.minute,
TempId.id,
TempId.count
FROM
TempIds
ORDER BY 3 DESC
LIMIT 1000
;
e.g.
minute id count
------ -- ----
2 B 1002
3 A 990
1 C 800
3 B 798
How can I modify my query to get the type of the id? For example, there are 1002 <minute=2,id=B> rows. Is there a way to get there are 402 solids and 600 liquids?
minute id count type
------ -- ---- -----
2 B 402 solid
2 B 600 liquid
3 A 330 solid
3 A 660 liquid
The only way I can think of is a fairly complex nested query:
With TempTop AS (
With TempIds AS (
SELECT
state_table.minute as minute,
state_table.id as id,
COUNT(*)
FROM
state_table
GROUP BY 1,2
) SELECT
TempId.minute as minute,
TempId.id as id,
TempId.count
FROM
TempIds
ORDER BY 3 DESC
LIMIT 1000
)
) SELECT
state_table.minute,
state_table.id,
state_table.type,
COUNT(*)
FROM
state_table, TempTop
WHERE
state_table.minute = TempTop.minute
AND state_table.id = TempTop.id
;
Is there a simpler way to make this query? Goal:
- For the Top 1000 most frequent pairs, get the breakdown of the type.
CodePudding user response:
Your query is
SELECT minute, id, COUNT(*)
FROM state_table
GROUP BY minute, id
ORDER BY COUNT(*) DESC
LIMIT 1000;
and in this process you lose the types, because you want the 1000 top minute/id pairs, so you cannot simply group by minute, id, type
instead.
If this is only about the types 'solid' and 'liquid', you can apply conditional aggregation to get the separate counts along:
SELECT
minute, id,
COUNT(*) AS total,
SUM(type = 'solid') AS solid,
SUM(type = 'liquid') AS liquid
FROM state_table
GROUP BY minute, id
ORDER BY COUNT(*) DESC
LIMIT 1000;
Summing up the boolean expressions works in MySQL, because true equals 1 and false equals 0 there.
The problem with the above queries are ties, by the way. If there are two pairs with the same 1000th count, you pick one arbitrarily instead of showing only 999 or 1001 pairs then in order to treat both tying pairs the same. So I'd probably re-write the queries using DENSE_RANK
in order to properly handle ties.
For the more complicated case where the types are unknown at the point of writing the query, you need rows instead of columns, just as already shown in your request. In that case you really need to group by minute, id, type
first. The easiest way to get the total counts is with SUM OVER
then. Then rank the pairs as mentioned with DENSE_RANK
and keep the top 1000.
SELECT minute, id, type, cnt
FROM
(
SELECT
minute, id, type, cnt,
DENSE_RANK() OVER (ORDER BY total DESC) AS rnk
FROM
(
SELECT
minute, id, type,
COUNT(*) AS cnt,
SUM(COUNT(*)) OVER(PARTITION BY minute, id) AS total
FROM state_table
GROUP BY minute, id, type
) counted
) ranked
WHERE rnk <= 1000
ORDER BY rnk, minute, id, type;
This can get you more than 1000 minute/id pairs in case of ties. You can reduce this with RANK
instead of DENSE_RANK
. If these two approaches still don't get the number you want, you may have to count minute/id pairs separately in a subquery instead.
CodePudding user response:
Best way I can think of is, you still have to use 2 CTEs but not nested ones.
You have 3 tasks to achieve,
CTE will count the records.
second one will use row_number() window function to assign row numbers based on decreasing order or count.
last select will do a join between state table and second cte with a where clause specifying row_nunber <= 1000 and grouping based on ID , State and Minute.
With TempIdsCount AS (
SELECT
state_table.minute as minute,
state_table.id as id,
COUNT() rw_cnt
FROM
state_table GROUP BY 1,2 ) TempIdsRowNum AS (
SELECT
minute,
id,
Row_number() over ( order by rw_cnt desc ) rw_num
FROM
TempIdsCount )
SELECT
st.minute,
st.id, st.type,
Count() cnt FROM
state_table st Join TempIdsRowNum trn on st.minute = trn.minute and st.id = trn.id Where rw_num <= 1000 Group by st.minute, st.id, st.type ;