Home > OS >  How to get the frequency of each element in array from all rows in table in PostgreSql
How to get the frequency of each element in array from all rows in table in PostgreSql

Time:11-02

I am having a table of 2 columns and N rows like below,

email | phone_num_list
----------------------
'a'   | {"1", "2", "3"}
'a'   | {"1", "4"}
'c'   | {"5", "1", "6"}
'd'   | {"3", "7", "1"}

where phone_num_list is of type array.

My requirement is to get the most used phone numbers and their scores, where score = number of distinct email associated with the phone_num * total frequency of phone_num

Hence, for the above example, distinct emails associated with phone_num 1 are ["a", "c", "d"].

Hence, the score of phone_num 1 is 3(i.e total distinct emails) * 4(i.e total frequency of 1)

Calculation of score for all the phone_num is written below -

phone_num | distinct emails associated | freq of phone_num | final_score
-------------------------------------------------------------------------
   1      |     ["a", "c", "d"]        |      4            |   4*3 = 12
   2      |     ["a"]                  |      1            |   1*1 = 1
   3      |     ["a", "d"]             |      2            |   2*2 = 4
   4      |     ["a"]                  |      1            |   1*1 = 1
   5      |     ["c"]                  |      1            |   1*1 = 1
   6      |     ["c"]                  |      1            |   1*1 = 1
   7      |     ["d"]                  |      1            |   1*1 = 1

My desired output is ->

phone | score
-------------
 1    | 12
 3    | 4
 2    | 1 
 4    | 1
 5    | 1
 6    | 1
 7    | 1 

Please help me with the query in PostgreSql. Thanks

CodePudding user response:

Preparing the test case:

create temporary table t (email text, phone_num_list text[]);
insert into t(email, phone_num_list) values
('a', '{"1", "2", "3"}'),
('a', '{"1", "4"}'),
('c', '{"5", "1", "6"}'),
('d', '{"3", "7", "1"}');

'Normalize' the table into nt CTE and then calculate the frequency times the number of distinct emails per phone number.

with nt as (select email, unnest(phone_num_list) as phone from t)
select phone, count(*) * count(distinct email) as score
from nt group by phone order by score desc;
phone|score|
----- ----- 
1    |   12|
3    |    4|
5    |    1|
6    |    1|
7    |    1|
4    |    1|
2    |    1|

CodePudding user response:

You can use "unnest" function.

    select phone_number, 
    count(phone_number) * count(distinct email) as score
    from
        (
        select email, unnest(phone_num_list) as phone_number
        from t1
        )z
    group by 1
    order by 2 desc

db-fiddle here:

  • Related