How to get the frequency of each element in array from all rows in table in PostgreSql-CodePudding

I am having a table of 2 columns and N rows like below,

email | phone_num_list
----------------------
'a'   | {"1", "2", "3"}
'a'   | {"1", "4"}
'c'   | {"5", "1", "6"}
'd'   | {"3", "7", "1"}

where phone_num_list is of type array.

My requirement is to get the most used phone numbers and their scores, where score = number of distinct email associated with the phone_num * total frequency of phone_num

Hence, for the above example, distinct emails associated with phone_num 1 are ["a", "c", "d"].

Hence, the score of phone_num 1 is 3(i.e total distinct emails) * 4(i.e total frequency of 1)

Calculation of score for all the phone_num is written below -

phone_num | distinct emails associated | freq of phone_num | final_score
-------------------------------------------------------------------------
   1      |     ["a", "c", "d"]        |      4            |   4*3 = 12
   2      |     ["a"]                  |      1            |   1*1 = 1
   3      |     ["a", "d"]             |      2            |   2*2 = 4
   4      |     ["a"]                  |      1            |   1*1 = 1
   5      |     ["c"]                  |      1            |   1*1 = 1
   6      |     ["c"]                  |      1            |   1*1 = 1
   7      |     ["d"]                  |      1            |   1*1 = 1

My desired output is ->

phone | score
-------------
 1    | 12
 3    | 4
 2    | 1 
 4    | 1
 5    | 1
 6    | 1
 7    | 1

Please help me with the query in PostgreSql. Thanks

CodePudding user response：

Preparing the test case:

create temporary table t (email text, phone_num_list text[]);
insert into t(email, phone_num_list) values
('a', '{"1", "2", "3"}'),
('a', '{"1", "4"}'),
('c', '{"5", "1", "6"}'),
('d', '{"3", "7", "1"}');

'Normalize' the table into nt CTE and then calculate the frequency times the number of distinct emails per phone number.

with nt as (select email, unnest(phone_num_list) as phone from t)
select phone, count(*) * count(distinct email) as score
from nt group by phone order by score desc;

phone|score|
----- ----- 
1    |   12|
3    |    4|
5    |    1|
6    |    1|
7    |    1|
4    |    1|
2    |    1|

CodePudding user response：

You can use "unnest" function.

    select phone_number, 
    count(phone_number) * count(distinct email) as score
    from
        (
        select email, unnest(phone_num_list) as phone_number
        from t1
        )z
    group by 1
    order by 2 desc

db-fiddle here: