I have a table, hs_book
that has column called GENRE_ID
I get the description of that genre from a table called GENRE
My subquery for column Genre
:
SELECT *,
Genre = (
SELECT DESCRIPTION
FROM GENRE
WHERE GENRE_ID = (
SELECT TOP 1 GENRE_ID
FROM BOOK_GENRES
WHERE BOOK_ID = hs_book.BOOK_ID
)
)
FROM hs_book
When I execute this query, I get 525 books (which is correct).
At the request of a senior DBA, I am trying to convert it to a JOIN so I don't need the subquery, so I did this:
SELECT * FROM hs_book hsb
INNER JOIN BOOK_GENRES bg ON hsb.BOOK_ID = bg.BOOK_ID
When I run that, I get 541 results back, which is 16 more than the subquery.
After checking the data, I can see somehow, a few books have multiple GENRE_IDs
.
Is there a way to modify my join so that I get only one genre back for each book?
CodePudding user response:
You need to ensure that BOOK_GENRES
returns a single row per BOOK_ID
.
- Either aggregate it
SELECT
hsb.ID,
hsb.Name,
STRING_AGG(bg.Genre, ', ') Genre
FROM hs_book hsb
INNER JOIN BOOK_GENRES bg ON hsb.BOOK_ID = bg.BOOK_ID
GROUP BY
hsb.ID,
hsb.Name;
- Or pre-aggregate it in a derived table
SELECT
hsb.ID,
hsb.Name,
bg.Genre
FROM hs_book hsb
INNER JOIN (
SELECT
bg.BOOK_ID,
STRING_AGG(bg.Genre, ', ') Genre
FROM BOOK_GENRES bg
GROUP BY
bg.BOOK_ID
) bg ON hsb.BOOK_ID = bg.BOOK_ID;
You can also do this with an APPLY
.
SELECT
hsb.ID,
hsb.Name,
bg.Genre
FROM hs_book hsb
CROSS APPLY (
SELECT
STRING_AGG(bg.Genre, ', ') Genre
FROM BOOK_GENRES bg
WHERE hsb.BOOK_ID = bg.BOOK_ID
GROUP BY
()
) bg;
- Or use
ROW_NUMBER
to filter it. If you don't care which order then useORDER BY (SELECT NULL)
.
SELECT
hsb.ID,
hsb.Name,
bg.Genre
FROM hs_book hsb
INNER JOIN (
SELECT
BG.BOOK_ID,
bg.Genre,
ROW_NUMBER() OVER (PARTITION BY bg.BOOK_ID ORDER BY bg.Genre) rn
FROM BOOK_GENRES bg
) bg ON hsb.BOOK_ID = bg.BOOK_ID AND bg.rn = 1;
The first three options are likely all the same performance, as the compiler can usually convert between them.
The final one is likely similar to your original query.
Note that these have slightly different semantics than your original, as they use INNER JOIN
and CROSS APPLY
. You may want to switch to LEFT JOIN
or OUTER APPLY
.