I have some tables with properties of Name and Result each. i want to join tables by name similarity but limit the result arrived from each group of similarity into max 2 results.
CREATE TABLE data_a(
Id serial primary key,
Name VARCHAR(70) NOT NULL,
Result INT4 NOT NULL);
CREATE TABLE data_b(
Id serial primary key,
Name VARCHAR(70) NOT NULL,
Result INT4 NOT NULL);
INSERT INTO data_a
(Name, Result)
VALUES
('Todd', 2),
('John', 5);
INSERT INTO data_b
(Name, Result)
VALUES
('Johns', 5),
('Todi', 3),
('Tod', 4),
('Todd', 5),
('John', 1),
('Jon', 1),
('Johny', 1),
('Johnny', 1),
('Johni', 1);
i would like to run a query that join both tables by name similarity and limit results to up to 2 results
SELECT da.Name as Name_a,db.Name , similarity(da.Name,da.Name) > 0.5
FROM data_a da
JOIN data_b db
ON da.Name % db.Name
GROUP BY da.Name,db.Name
ORDER BY similarity
LIMIT 2
and recive
|Name_a|Name_b|similarity|
|------|------|----------|
|Todd | Todd | 1 |
|------|------|----------|
|Todd | Tod |0.8 |
|------|------|----------|
|John | John |1 |
|------|------|----------|
|John | Johny|0.76 |
|------|------|----------|
currently i get
|Name_a|Name_b|similarity|
|------|------|----------|
|Todd | Todd | 1 |
|John | John |1 |
it seems that i'm not using correctly in group by , how can i group by
CodePudding user response:
You can use lateral join to map each name to a name from the first table, then truncate the result to 2 values for each name.
with data AS (
select
da.name da_name,
db.name db_name,
similarity(da.name::text, db.name::text) similarity
from
data_a da
left join lateral (
select
data_b.Name
from
data_b
) db ON da.Name like '%' || db.Name || '%'
or db.Name like '%' || da.Name || '%'
)
select
*
from
(
select
row_number() over (
partition by da_name
order by
similarity
) r,
t.*
FROM
data t
) x
WHERE
x.r <= 2;
Demo in sql<>daddy.io
CodePudding user response:
If you want to apply the LIMIT separately per da.Name, you would have to do it in a LATERAL subquery:
SELECT da.Name as Name_a,db.Name , similarity(da.Name,db.Name)
FROM data_a da CROSS JOIN LATERAL
(
SELECT db.Name
FROM data_b db
WHERE da.Name % db.Name
ORDER BY similarity(da.Name,db.Name) DESC
LIMIT 2
) db;