I am having a table of 2 columns and N rows like below,
email | phone_num_list
----------------------
'a' | {"1", "2", "3"}
'a' | {"1", "4"}
'c' | {"5", "1", "6"}
'd' | {"3", "7", "1"}
where phone_num_list
is of type array.
My requirement is to get the most used phone numbers and their scores, where score = number of distinct email associated with the phone_num * total frequency of phone_num
Hence, for the above example, distinct emails associated with phone_num 1
are ["a", "c", "d"]
.
Hence, the score of phone_num 1
is 3(i.e total distinct emails) * 4(i.e total frequency of 1)
Calculation of score for all the phone_num is written below -
phone_num | distinct emails associated | freq of phone_num | final_score
-------------------------------------------------------------------------
1 | ["a", "c", "d"] | 4 | 4*3 = 12
2 | ["a"] | 1 | 1*1 = 1
3 | ["a", "d"] | 2 | 2*2 = 4
4 | ["a"] | 1 | 1*1 = 1
5 | ["c"] | 1 | 1*1 = 1
6 | ["c"] | 1 | 1*1 = 1
7 | ["d"] | 1 | 1*1 = 1
My desired output is ->
phone | score
-------------
1 | 12
3 | 4
2 | 1
4 | 1
5 | 1
6 | 1
7 | 1
Please help me with the query in PostgreSql. Thanks
CodePudding user response:
Preparing the test case:
create temporary table t (email text, phone_num_list text[]);
insert into t(email, phone_num_list) values
('a', '{"1", "2", "3"}'),
('a', '{"1", "4"}'),
('c', '{"5", "1", "6"}'),
('d', '{"3", "7", "1"}');
'Normalize' the table into nt
CTE and then calculate the frequency times the number of distinct emails per phone number.
with nt as (select email, unnest(phone_num_list) as phone from t)
select phone, count(*) * count(distinct email) as score
from nt group by phone order by score desc;
phone|score|
----- -----
1 | 12|
3 | 4|
5 | 1|
6 | 1|
7 | 1|
4 | 1|
2 | 1|
CodePudding user response:
You can use "unnest" function.
select phone_number,
count(phone_number) * count(distinct email) as score
from
(
select email, unnest(phone_num_list) as phone_number
from t1
)z
group by 1
order by 2 desc
db-fiddle here: