Using where clause with not-equal condition after join-CodePudding

I am trying to use WHERE with not-equal condition after joining two tables but it does not work. Example: I have a table with data on famous people and a separate table with their works. Some works can have several authors. So I want a table listing authors with their co-authors:

CREATE TABLE famous_people (id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT,
    profession TEXT,
    birth_year INTEGER);
    
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("Landau", "physicist", 1908);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("Lifshitz", "physicist", 1908);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("Fisher", "statistician", 1908);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("Ginzburg", "physicist", 1916);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("A. Strugatsky", "writer", 1925);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("B. Strugatsky", "writer", 1933);


CREATE TABLE works (id INTEGER PRIMARY KEY AUTOINCREMENT,
    person_id INTEGER,
    work TEXT);
INSERT INTO works (person_id, work)
    VALUES (1, "Theoretical Physics");
INSERT INTO works (person_id, work)
    VALUES (2, "Theoretical Physics");
INSERT INTO works (person_id, work)
    VALUES (1, "Theory of Superconductivity");
INSERT INTO works (person_id, work)
    VALUES (4, "Theory of Superconductivity");
INSERT INTO works (person_id, work)
    VALUES (3, "Fisher test");
INSERT INTO works (person_id, work)
    VALUES (5, "Roadside Picnic");
INSERT INTO works (person_id, work)
    VALUES (6, "Roadside Picnic");
INSERT INTO works (person_id, work)
    VALUES (5, "Hard to Be a God");
INSERT INTO works (person_id, work)
    VALUES (6, "Hard to Be a God");

/* Co-authors */
SELECT a.name AS author, b.name AS coauthor FROM works
    JOIN famous_people a
    ON works.person_id = a.id
    JOIN famous_people b
    ON works.person_id = b.id;

It is Ok, except each author also has themselves as their own co-author, so I am trying to filter it out by adding WHERE author <> coauthor as the last line. But what I get is a table with two columns: work and name. Same weird result with WHERE a.name <> b.name Funny enough, WHERE author = coauthor works fine but this is not what I want.

Expected result: a table with 2 columns:

author co-author

Landau Lipshitz

A. Strugatsky B. Strugatsky

Fisher NULL

CodePudding user response：

Find all works that have two authors (using inner join on same work but different authors) and find all works that have one author (using not exists). Then combine the results:

SELECT w1.work, p1.name AS author, p2.name AS coauthor
FROM works AS w1
JOIN works AS w2 ON w1.work = w2.work AND w1.person_id < w2.person_id
JOIN famous_people AS p1 ON w1.person_id = p1.id
JOIN famous_people AS p2 ON w2.person_id = p2.id

UNION ALL

SELECT w1.work, p1.name, null
FROM works AS w1
JOIN famous_people AS p1 ON w1.person_id = p1.id
WHERE NOT EXISTS (
    SELECT 1
    FROM works AS w2
    WHERE w2.work = w1.work AND w2.person_id <> w1.person_id
)

Demo on DB<>Fiddle

CodePudding user response：

Your query cannot work. Keep in mind that a join works on rows. So there is one works row with one person ID that you look at at a time in your where clause. Then you join the person to the works row and then you join the person to the works row. That is the same person twice of course, because one works row only refers to one person.

This shows another, minor, problem. You call this table works. I would consider "Theoretical Physics" a work. You do so too; you named the column work. But then, why is the same work twice in the works table? This must not be. A works table shall store works, i.e. one work per row. What you have is a work_author table actually, and a work is uniquely identified by its title. This kind of makes sense; a title may uniquely identify a work - as long as no other author happens to name their work "Theoretical Physics", too :-( And as long as there are no typos in the table either.

This would be a better model:

person (person_id, name, birth_year, ...)
work (work_id, title, year, ...)
work_author (work_id, person_id)

If you have a typo in a title in this model, there is one row where you correct it and the data stays intact.

Now you want to get the authors of a work. This is easily done with aggregation:

select w.*, group_concat(p.name) as authors
from work_author wa
join person p on p.person_id = wa.person_id
join work w on w.work_id = wa.work_id
group by w.work_id
order by w.work_id;

You forgot to tell us your DBMS. As you are using double quotes where it must be single quotes according to the SQL standard, and your DBMS doesn't complain, this may be MySQL. (You should still always use single quotes for string literals.) For MySQL the string aggregation function is GROUP_CONCAT, so guessing MySQL, I used that in my query. Other DBMS use STRING_AGG, LISTAGG or something else.

If you just want to show up to two authors per work, you can take the minimum and maximum name (and compare the two in order not to show the same author twice):

select
  w.*, 
  min(p.name) as author1,
  case when min(p.name) <> max(p.name) then max(p.name) end as author2
from ...

UPDATE

In the comments you say that for every author you want to know all authors who worked with them. For this you need to join authors to authors based on their works. Still assuming MySQL:

select p1.name, group_concat(distinct p2.name) as others
from work_author wa1
join work_author wa2 on wa2.work_id = wa1.work_id
                     and wa2.person_id <> wa1.person_id
join person p1 on p1.person_id = wa1.person_id
join person p2 on p2.person_id = wa2.person_id
group by p1.name
order by p1.name;

Or not aggregated:

select distinct p1.name as person1, p2.name as person2
from work_author wa1
join work_author wa2 on wa2.work_id = wa1.work_id
                     and wa2.person_id <> wa1.person_id
join person p1 on p1.person_id = wa1.person_id
join person p2 on p2.person_id = wa2.person_id
order by p1.name, p2.name;

CodePudding user response：

I changed the model as proposed by Thorsten Kettner and solved the task of matching authors with their co-authors as follows:

CREATE TABLE famous_people (id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT,
    profession TEXT,
    birth_year INTEGER);
    
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("Landau", "physicist", 1908);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("Lifshitz", "physicist", 1908);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("Fisher", "statistician", 1908);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("Ginzburg", "physicist", 1916);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("A. Strugatsky", "writer", 1925);
INSERT INTO famous_people (name, profession, birth_year)
    VALUES ("B. Strugatsky", "writer", 1933);

CREATE TABLE works (id INTEGER PRIMARY KEY AUTOINCREMENT,
    work TEXT,
    subject TEXT);
INSERT INTO works (work, subject)
    VALUES ("Theoretical Physics", "physics");
INSERT INTO works (work, subject)
    VALUES ("Theory of Superconductivity", "physics");
INSERT INTO works (work, subject)
    VALUES ("Fisher test", "statistics");
INSERT INTO works (work, subject)
    VALUES ("Roadside Picnic", "scifi");
INSERT INTO works (work, subject)
    VALUES ("Hard to Be a God", "scifi");

CREATE TABLE author_works (id INTEGER PRIMARY KEY AUTOINCREMENT,
    work_id INTEGER,
    author_id INTEGER);
INSERT INTO author_works (work_id, author_id) VALUES (1, 1);
INSERT INTO author_works (work_id, author_id) VALUES (1, 2);
INSERT INTO author_works (work_id, author_id) VALUES (2, 1);
INSERT INTO author_works (work_id, author_id) VALUES (2, 4);
INSERT INTO author_works (work_id, author_id) VALUES (3, 3);
INSERT INTO author_works (work_id, author_id) VALUES (4, 5);
INSERT INTO author_works (work_id, author_id) VALUES (4, 6);
INSERT INTO author_works (work_id, author_id) VALUES (5, 5);
INSERT INTO author_works (work_id, author_id) VALUES (5, 6);

/* List of authors and their works */
SELECT famous_people.name, works.work FROM author_works
    JOIN famous_people
    ON author_works.author_id = famous_people.id
    JOIN works
    ON works.id = author_works.work_id;
    
  
/* Authors and co-authors ids*/    
SELECT DISTINCT a.name, b.name 
    FROM author_works aw1
    JOIN author_works aw2
    ON aw1.work_id = aw2.work_id
    JOIN famous_people a
    ON aw1.author_id = a.id
    JOIN famous_people b
    ON aw2.author_id = b.id
    WHERE aw1.author_id <> aw2.author_id;