Home > Software engineering >  count of rows based on multiple conditions/partitions on the same table
count of rows based on multiple conditions/partitions on the same table

Time:07-23

DBFiddle Link: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2d7e9a4ddfdc8fb619a8dfc76d767950

Hi. I have one table called 'Model Versions' which has the fields and records as below.

intent_id intent_name version version_created_at client sentence
1 a_intent 1 2021-01-01 es_client1 sentence_1
1 a_intent 1 2021-01-01 es_client1 sentence_2
1 a_intent 1 2021-01-01 es_client1 sentence_3
2 b_intent 2 2021-02-01 es_client1 sentence_1
2 b_intent 2 2021-02-01 es_client1 sentence_2
2 b_intent 2 2021-02-01 es_client1 sentence_3
3 c_intent 3 2021-03-01 es_client1 sentence_1
3 c_intent 3 2021-03-01 es_client1 sentence_2
4 d_intent 4 2021-04-01 es_client1 sentence_1
4 d_intent 4 2021-04-01 es_client1 sentence_2
5 e_intent 5 2021-05-01 es_client1 sentence_1
6 g_intent 1 2021-01-01 es_client2 sentence_1
6 g_intent 1 2021-01-01 es_client2 sentence_2
7 h_intent 2 2021-03-01 es_client2 sentence_1
7 h_intent 2 2021-03-01 es_client2 sentence_2
7 h_intent 2 2021-03-01 es_client2 sentence_3
8 i_intent 3 2021-04-01 es_client2 sentence_1
8 i_intent 3 2021-04-01 es_client2 sentence_2
9 j_intent 4 2021-05-01 es_client2 sentence_1
9 j_intent 4 2021-05-01 es_client2 sentence_2
10 k_intent 1 2021-01-01 es_client3 sentence_1
10 k_intent 1 2021-01-01 es_client3 sentence_2
11 k_intent 2 2021-06-01 es_client3 sentence_1
11 k_intent 2 2021-06-01 es_client3 sentence_2
12 k_intent 3 2021-07-01 es_client3 sentence_1
12 k_intent 3 2021-07-01 es_client3 sentence_2
13 k_intent 4 2021-08-01 es_client3 sentence_1
13 k_intent 4 2021-08-01 es_client3 sentence_2
14 k_intent 5 2021-10-01 es_client3 sentence_1
14 k_intent 5 2021-10-01 es_client3 sentence_2

Expected Output:

I wanted to get the top 3 versions of each client along with their respective sentence count. My expected output looks like below:

client version total_count_of_sentences_per_version version_created_at
es_client1 5 1 2021-05-01
es_client1 4 2 2021-04-01
es_client1 3 2 2021-03-01
es_client2 4 2 2021-05-01
es_client2 3 2 2021-04-01
es_client2 2 3 2021-03-01
es_client3 5 2 2021-10-01
es_client3 4 2 2021-08-01
es_client3 3 2 2021-06-01

I tried writing a query with multiple CTEs and Partition By's. But none worked out. Seeking your help to achieve this.

DBFiddle Link: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2d7e9a4ddfdc8fb619a8dfc76d767950

CodePudding user response:

You did not specify which of the top 3 version you wish to fetch. I'll assume you want to retrieve the 3 latest versions, based on creation date.

My suggestion is to use a ROW_NUMBER() for each client in a windowed function, and to filter the top 3 rows.

For instance :

with cte as(
  select 
   client,
   version,
   version_created_at,
   count(Sentence) total_count_of_sentences_per_version,
   row_number() over(partition by client order by version_created_at desc) version_row_number
 from model_versions
 group by
   client,version,
   version_created_at
 )
select
  client,
  version,
  total_count_of_sentences_per_version,
  version_created_at
from cte
where version_row_number <=3

Try it online

CodePudding user response:

You can try this:

WITH main_tab
     AS (SELECT client,
                version,
                Count(*)
                  OVER (
                    partition BY client, version),
                Min(version_created_at)
                  OVER (
                    partition BY client, version),
                Dense_rank()
                  OVER (
                    partition BY client
                    ORDER BY version DESC) rn
         FROM   model_versions)
SELECT DISTINCT m.*
FROM   main_tab m;
  • Related