Let's say we have a Sqlite table containing:
name;city;age;id;foo
Alice;New-York;25;13782749;12
Eve;Chicago;23;1938679;34
Bob;New-York;25;824697624;56
Jack;Denver;30;239679163;12
Simone;Denver;30;1687631;99
How to group by h=CONCAT(city,age)
but also add in the group the users that have a column foo
identical to someone in the group?
As seen in Group items by 2 columns, here is how to group by h
:
select dense_rank() over (order by city, age) as grpnum, name, id from t;
How to add this second condition on foo
?
Example: here Alice and Bob are in the same group1
because they have identical h=CONCAT(city,age)
, but Jack should also be in group1
because he has same foo value as Alice: 12
. Simone has same CONCAT(city,age)
than Jack, therefore she is in the same group as Jack, in group1
as well.
CodePudding user response:
This is much trickier. This is a graph-walking problem, which requires a recursive CTE (Common Table Expression).
The idea is to generate all the edges between names. Then use a recursive CTE to visit all connected nodes (name
s), avoiding visiting existing ones.
Here is one method:
with recursive edges as (
select distinct t1.name as name1, t2.name as name2
from t t1 join
t t2
on t1.city = t2.city and t1.age = t2.age or
t1.foo = t2.foo
),
cte as (
select name1, name2, min(name1, name2) as first_name,
',' || name1 || ',' || name2 || ',' as visited
from edges
union all
select cte.name1, e.name2, min(cte.first_name, e.name2),
visited || e.name2 || ','
from cte join
edges e
on e.name1 = cte.name2
where visited not like '%,' || e.name2 || ',%'
)
select t.*, cte.grpnum
from t join
(select name1, min(first_name), dense_rank() over (order by min(first_name)) as grpnum
from cte
group by name1
) cte
on t.name = cte.name1;
And here is a db<>fiddle.