Home > Software design >  How to get rid of duplicates with multiple-table joins and aggregate function in MySQL?
How to get rid of duplicates with multiple-table joins and aggregate function in MySQL?

Time:11-27

When joining three tables, I get the desired values, but when I join more, the values begin to duplicate.

I understand what's going on (each currency is recorded in each category), but I don't know how to write a query to get rid of it.

Here are examples of tables: volunteers categories volunteer_category currencies volunteer_currency

When i write a query:

SELECT DISTINCT v.id, v.name, v.type, v.logo, v.description, JSON_ARRAYAGG(c.category) AS categories 
    FROM volunteers v 
      INNER JOIN volunteer_category vc ON v.id = vc.volunteer_id 
      INNER JOIN categories c ON vc.category_id = c.id
    GROUP BY v.id

The result looks good:

id: 1,
name: "Serhiy Prytula",
type: "Charity foundation of",
logo: "https://scontent.fdnk6-2.fna.fbcdn.net/v/t39.30808-1/294616431_145621521453151_1955718552327534606_n.jpg?stp=cp0_dst-jpg_p40x40&_nc_cat=107&ccb=1-7&_nc_sid=1eb0c7&_nc_ohc=fxJFxGLTc_QAX8_9ZS9&_nc_ht=scontent.fdnk6-2.fna&oh=00_AfAjiJWqlm46HddMDALnVOqeuHmqoCGI6pgiM6lAdLKKxg&oe=636D0B6E",
description: "Owner is a well-known Ukrainian TV presenter, actor, stand-up comedian, author and co-producer of several TV shows since 1998. Former resident of Comedy Club Ukraine. Serhiy Prytula has been helping the army as a private volunteer since Russia began its occupation of Crimea and Donbas. At first, he helped with equipment, armor, and food. Subsequently, he began to provide units with high-precision equipment, drones and equipment for command posts.",
categories: "["vehicles", "surveillance", "communication", "weapons", "mechanical_repair", "training", "humanitarian_aid", "housing"]"
},

But when I add currency to the query:

SELECT DISTINCT v.id, v.name, v.type, v.logo, v.description, JSON_ARRAYAGG(c.category) AS categories, JSON_ARRAYAGG(cur.code) AS currencies 
    FROM volunteers v 
      INNER JOIN volunteer_category vc ON v.id = vc.volunteer_id 
      INNER JOIN categories c ON vc.category_id = c.id
      INNER JOIN volunteer_currency vcur ON v.id = vcur.volunteer_id
      INNER JOIN currencies cur ON vcur.currency_id = cur.id
    GROUP BY v.id

The result is:

{
id: 1,
name: "Serhiy Prytula",
type: "Charity foundation of",
logo: "https://scontent.fdnk6-2.fna.fbcdn.net/v/t39.30808-1/294616431_145621521453151_1955718552327534606_n.jpg?stp=cp0_dst-jpg_p40x40&_nc_cat=107&ccb=1-7&_nc_sid=1eb0c7&_nc_ohc=fxJFxGLTc_QAX8_9ZS9&_nc_ht=scontent.fdnk6-2.fna&oh=00_AfAjiJWqlm46HddMDALnVOqeuHmqoCGI6pgiM6lAdLKKxg&oe=636D0B6E",
description: "Owner is a well-known Ukrainian TV presenter, actor, stand-up comedian, author and co-producer of several TV shows since 1998. Former resident of Comedy Club Ukraine. Serhiy Prytula has been helping the army as a private volunteer since Russia began its occupation of Crimea and Donbas. At first, he helped with equipment, armor, and food. Subsequently, he began to provide units with high-precision equipment, drones and equipment for command posts.",
categories: "["vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "communication", "communication", "communication", "communication", "communication", "communication", "communication", "communication", "communication", "communication", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "training", "training", "training", "training", "training", "training", "training", "training", "training", "training", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "housing", "housing", "housing", "housing", "housing", "housing", "housing", "housing", "housing", "housing"]",
currencies: "["uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth"]"
}

I want to remove duplicates and optimize the query because I will be joining more tables.

I also tried using JSON_ARRAYAGG in INNER JOIN but don't know how to unpack json in ON clause

INNER JOIN (SELECT volunteer_id, JSON_ARRAYAGG(currency_id) AS currency_id FROM volunteer_currency GROUP BY volunteer_id) AS vcur ON v.id = vcur.volunteer_id
INNER JOIN currencies cur ON ???

CodePudding user response:

You already know whats happening. You join all categories and all currencies to a volunteer, so for a volunteer with two categories and two currencies you get four rows of unrelated category-currency pairs. You then aggregate your data per volunteer, but that data is already flawed.

Instead, join the category array and the currency array to a volunteer:

SELECT
  v.id, v.name, v.type, v.logo, v.description,
  cat.categories,
  cur.currencies
FROM volunteers v 
INNER JOIN
(
  SELECT vc.volunteer_id, JSON_ARRAYAGG(c.category) AS categories
  FROM volunteer_category vc
  INNER JOIN categories c ON c.id = vc.category_id
  GROUP BY vc.volunteer_id
) cat ON cat.volunteer_id = v.id
INNER JOIN
(
  SELECT vc.volunteer_id, JSON_ARRAYAGG(c.code) AS currencies 
  FROM volunteer_currency vc
  INNER JOIN currencies c ON c.id = vc.currency_id
  GROUP BY vc.volunteer_id
) cur ON cur.volunteer_id = v.id
ORDER BY v.id;

And if there can be volunteers without a catergory or without a currency and you still want to show them, change your inner joins to left outer joins.

  • Related