Home > Software design >  How can I modify my inner join so that it only returns 1 row per item?
How can I modify my inner join so that it only returns 1 row per item?

Time:08-16

I have a table, hs_book that has column called GENRE_ID

I get the description of that genre from a table called GENRE

My subquery for column Genre :

SELECT *,
    Genre = (
        SELECT DESCRIPTION
        FROM GENRE
        WHERE GENRE_ID = (
            SELECT TOP 1 GENRE_ID 
            FROM BOOK_GENRES
            WHERE BOOK_ID = hs_book.BOOK_ID
            )
    )
FROM hs_book

When I execute this query, I get 525 books (which is correct).

At the request of a senior DBA, I am trying to convert it to a JOIN so I don't need the subquery, so I did this:

SELECT * FROM hs_book hsb
INNER JOIN BOOK_GENRES bg ON hsb.BOOK_ID = bg.BOOK_ID

When I run that, I get 541 results back, which is 16 more than the subquery.

After checking the data, I can see somehow, a few books have multiple GENRE_IDs.

Is there a way to modify my join so that I get only one genre back for each book?

CodePudding user response:

You need to ensure that BOOK_GENRES returns a single row per BOOK_ID.

  • Either aggregate it
SELECT
  hsb.ID,
  hsb.Name,
  STRING_AGG(bg.Genre, ', ') Genre
FROM hs_book hsb
INNER JOIN BOOK_GENRES bg ON hsb.BOOK_ID = bg.BOOK_ID
GROUP BY
  hsb.ID,
  hsb.Name;
  • Or pre-aggregate it in a derived table
SELECT
  hsb.ID,
  hsb.Name,
  bg.Genre
FROM hs_book hsb
INNER JOIN (
    SELECT
      bg.BOOK_ID,
      STRING_AGG(bg.Genre, ', ') Genre
    FROM BOOK_GENRES bg
    GROUP BY
      bg.BOOK_ID
) bg ON hsb.BOOK_ID = bg.BOOK_ID;

You can also do this with an APPLY.

SELECT
  hsb.ID,
  hsb.Name,
  bg.Genre
FROM hs_book hsb
CROSS APPLY (
    SELECT
      STRING_AGG(bg.Genre, ', ') Genre
    FROM BOOK_GENRES bg
    WHERE hsb.BOOK_ID = bg.BOOK_ID
    GROUP BY
      ()
) bg;
  • Or use ROW_NUMBER to filter it. If you don't care which order then use ORDER BY (SELECT NULL).
SELECT
  hsb.ID,
  hsb.Name,
  bg.Genre
FROM hs_book hsb
INNER JOIN (
    SELECT
      BG.BOOK_ID,
      bg.Genre,
      ROW_NUMBER() OVER (PARTITION BY bg.BOOK_ID ORDER BY bg.Genre) rn
    FROM BOOK_GENRES bg
) bg ON hsb.BOOK_ID = bg.BOOK_ID AND bg.rn = 1;

The first three options are likely all the same performance, as the compiler can usually convert between them.

The final one is likely similar to your original query.

Note that these have slightly different semantics than your original, as they use INNER JOIN and CROSS APPLY. You may want to switch to LEFT JOIN or OUTER APPLY.

  • Related