Home > Back-end >  SQL - How to avoid a repeating row when joining tables via a junction table?
SQL - How to avoid a repeating row when joining tables via a junction table?

Time:03-03

I'm building a Pokemon API, and I'm having trouble with a repeating row when joining via a junction table. It is a SQL Server database. A watered down version of my tables looks like this:

Database Tables

To recreate my database:

CREATE DATABASE PokeApi;
USE PokeApi;

CREATE TABLE Pokemon (
    PokemonId INT PRIMARY KEY IDENTITY,
    PokemonName VARCHAR(55),
    NationalDexNumber INT,
    PokemonHeight INT,
    PokemonWeight DECIMAL(10,2),
    Category VARCHAR(50),
    Gender VARCHAR(15),
    GenderDifference BIT,
    TypeOne VARCHAR(15),
    TypeTwo VARCHAR(15),
);

CREATE TABLE Abilities (
    AbilityId INT PRIMARY KEY IDENTITY,
    AbilityName VARCHAR(20)
);

CREATE TABLE PokemonAbilitiesMapped (
    PokemonId INT NOT NULL,
    AbilityId INT,
    CONSTRAINT PK_PokemonAbilitiesMapped PRIMARY KEY
    (
        PokemonId,
        AbilityId
    ),
    FOREIGN KEY(PokemonId) REFERENCES Pokemon(PokemonId),
    FOREIGN KEY(AbilityId) REFERENCES Abilities(AbilityId)
);

INSERT INTO Pokemon
VALUES
  ('Venusaur', 3, 79, 220.5, 'Seed', 'Male / Female', 1, 'Grass', 'Poison'),
  ('Mega Venusaur', 3, 94, 342.8, 'Seed', 'Male / Female', 0, 'Grass', 'Poison'),
  ('Gigantamax Venusaur', 3, 945, NULL, 'Seed', 'Male / Female', 1, 'Grass', 'Poison');

INSERT INTO Abilities
VALUES
  ('Overgrow'),
  ('Thick Fat'),
  ('Test');

INSERT INTO PokemonAbilitiesMapped
VALUES
  (1, 1),
  (2, 2),
  (3, 1),
  (3, 3);

My query:

SELECT p.PokemonName, p.NationalDexNumber, p.PokemonHeight, p.PokemonWeight,
p.Category, p.Gender, p.GenderDifference, p.TypeOne, p.TypeTwo,
a.AbilityName
FROM Pokemon p
JOIN PokemonAbilitiesMapped pam
ON p.PokemonId = pam.PokemonId
JOIN Abilities a
ON pam.AbilityId = a.AbilityId

Outputs this:

PokemonName NationalDexNumber PokemonHeight PokemonWeight Category Gender GenderDifference TypeOne TypeTwo AbilityName
Venusaur 3 79 220.50 Seed Male / Female True Grass Poison Overgrow
Mega Venusaur 3 94 342.80 Seed Male / Female False Grass Poison Thick Fat
Gigantamax Venusaur 3 945 null Seed Male / Female True Grass Poison Overgrow
Gigantamax Venusaur 3 945 null Seed Male / Female True Grass Poison Test

As you can see, Gigantamax Venusaur is listed twice because it has two abilities. I was thinking that I could add another column called "AbilityNameTwo" only if the pokemon queried had a second ability to avoid the row repeating.

What is the best way to avoid repeating data in this scenario? If I should add another column when joining, how would I write that as a query?

CodePudding user response:

It's perfectly normal to have results like that when joining data. If you really want to, you can have "ability two", but what if some pokemon has three, or five? And how do you order them? Here's an example of how you'd do it for two abilities and ordering alphabetically:

WITH abilitiesMapped AS
(
    SELECT pam.PokemonId
         , a.AbilityName
         , ROW_NUMBER() OVER (PARTITION BY pam.PokemonId 
                                  ORDER BY a.AbilityName) AS abilityNo
      FROM PokemonAbilitiesMapped pam
      JOIN Abilities a
        ON a.AbilityId = pam.AbilityId
)
SELECT p.PokemonName, p.NationalDexNumber, p.PokemonHeight, p.PokemonWeight,
p.Category, p.Gender, p.GenderDifference, p.TypeOne, p.TypeTwo
   , ability1.abilityName AS ability1
   , ability2.abilityName AS ability2
FROM Pokemon p
LEFT OUTER
JOIN abilitiesMapped ability1
  ON ability1.PokemonId = p.PokemonId
 AND ability1.abilityNo = 1
LEFT OUTER
JOIN abilitiesMapped ability2
  ON ability2.PokemonId = p.PokemonId
 AND ability1.abilityNo = 2

CTE abilitiesMapped takes your mappings and abilities and generates an abilityNo for each pokemon (PARTITION BY pam.PokemonId) where the number is ordered by the ability name (ORDER BY ...). Then this CTE is joined twice for abilityNo 1 and 2. LEFT JOIN is important because a pokemon does not necessarily need to have two abilities, and you will get a NULL in ability2 in such case (with an INNER JOIN you'd only get the ones that have two abilities because abilityNo = 2 needs to exist in order to satisfy the join condition).

CodePudding user response:

Thanks for your post with create, insert statements. It was very helpful.

To avoid duplication of the ability name, You can make the ability names as comma separated list. I am using STR_AGG to generate comma separated list. It is available from SQL Server 2017 onwards. For earlier versions, you have to use FOR XML PATH to generate comma separated list.

SELECT p.PokemonName, p.NationalDexNumber, p.PokemonHeight, p.PokemonWeight,
p.Category, p.Gender, p.GenderDifference, p.TypeOne, p.TypeTwo,
STRING_AGG(a.AbilityName,',') AS Abilities
FROM Pokemon p
JOIN PokemonAbilitiesMapped pam
ON p.PokemonId = pam.PokemonId
JOIN Abilities a
ON pam.AbilityId = a.AbilityId
GROUP BY p.PokemonName, p.NationalDexNumber, p.PokemonHeight, p.PokemonWeight,
p.Category, p.Gender, p.GenderDifference, p.TypeOne, p.TypeTwo
PokemonName NationalDexNumber PokemonHeight PokemonWeight Category Gender GenderDifference TypeOne TypeTwo Abilities
Gigantamax Venusaur 3 945 NULL Seed Male / Female 1 Grass Poison Overgrow,Test
Mega Venusaur 3 94 342.80 Seed Male / Female 0 Grass Poison Thick Fat
Venusaur 3 79 220.50 Seed Male / Female 1 Grass Poison Overgrow
  • Related