Home > Back-end >  Group rows based on same column values in other table
Group rows based on same column values in other table

Time:11-11

We have two tables:

  1. keywords table
 id |   phrase   | volume | difficulty
---- ------------ -------- ------------
  1. urls table:
 id |                        url                          keyword_id
---- ---------------------------------------------------- ------------

URLs table has a foreign key to keywords table. Each keyword has multiple URLs. Now we want to group keywords that have more than 3 rows from urls table with matching url column value. We are not able to get proper results using group by clause. What would be the best way to achieve the result?

CodePudding user response:

Trying to reformulate the question :

How to group the keywords which have in common at least 4 urls in table urls ?

for instance grouping keyword1 associated to (url1,url2,url3,url4,url5,url6) and keyword2 associated to (url2,url4,url5,url6,url7) because both keywords share the list of url (url2,url4,url5,url6).

To do so, you can try with a recursive cte :

WITH RECURSIVE list AS
-- group the keyword_id by url
-- remove the urls associated to only one keyword_id
-- assign a rank id to the resulting rows
(
SELECT array[url] AS url_array
     , array_agg(DISTINCT keyword_id) AS keyword_array
     , row_number() OVER () AS rank
  FROM urls
 GROUP BY url
HAVING array_length(keyword_array, 1) > 1
)
, consolidated_list (url_array, keyword_array, rank) AS
-- iterate on the keyword_arrays whose keyword_id have 1, then 2, then 3, then 4 common urls
-- at every iteration, reduce the keyword_array to the elements which have the new added url in common
(
SELECT *
  FROM list
UNION ALL
SELECT cl.url_array || l.url_array
     , c.keyword_array
     , l.rank
  FROM list AS l
 INNER JOIN consolidated_list AS cl
    ON cl.keyword_array && l.keyword_array
   AND NOT cl.url_array @> l.url_array
   AND l.rank > cl.rank  -- to avoid redundant rows with url_arrays with the same set of elements in different orders
 CROSS JOIN LATERAL  -- intersect the arrays of keyword_ids so that to keep only th common keyword_ids
     ( SELECT array_agg(l.kw) AS keyword_array 
         FROM unnest(cl.keyword_array) AS cl(kw)
        INNER JOIN unnest(l.keyword_array) AS l(kw)
           ON l.kw = cl.kw
     ) AS c
 WHERE array_length(cl.url_array, 1) < 4 -- stop the iterations after the 4th loop even if some keyword_ids may have more than 4 urls in common
)
SELECT keyword_array, url_array
  FROM consolidated_list
 WHERE array_length(url_array, 1) = 4 ;  -- only keep the keywords groups with 4 (or more) common urls

The result is a list of arrays of keyword_ids which have in common at least the 4 urls of the associated url_array.

  • Related