Home > Software design >  Identifying duplicates with two criterias with MySQL workbench
Identifying duplicates with two criterias with MySQL workbench

Time:11-07

I'm using MySQL Workbench to extract a list.

This is what my database looks like: database

I'm trying to identify the composer with the most duplicated tracks in this database.

This is the code I entered:

SELECT Name, Composer, Count(*) FROM Track

GROUP BY Name, Composer

HAVING Count(*)>1

ORDER BY Count(*) DESC;

As a result of this query, I have the number of duplicates per tracks, alongside the composer. The thing is I have duplicates in composers. Therefore, my code gives me the track that is the most duplicated and its associated composer. This is not what I'm looking for. How can I have identify the composer with the most duplicated tracks in this database?

Here is a result of my query if it can help: query-result

CodePudding user response:

Use your query as a subquery, and wrap another query around it. (This building of queries from subqueries is part of the Structured in Structured Query Language).

Something like this might work.

SELECT Composer, SUM(tracks) duplicate_tracks
  FROM (SELECT Name, Composer, Count(*) tracks 
               FROM Track
              GROUP BY Name, Composer
             HAVING COUNT(*) > 1
       ) duptracks
 GROUP BY Composer
 ORDER BY COUNT(*) DESC;

It uses your query as if it were a table (which it is, a virtual table) and runs a GROUP BY on it. To use your query that way, it needs a name on the COUNT(*) item, so I gave it the name tracks. It also needs a name on the subquery. I used duptracks for that.

The outer query adds up all the duplicated tracks by composer with SUM() ... GROUP BY Composer then orders them presenting the composer with the most duplicated tracks first.

But notice that this works on the contents of your Composer columns as a whole. It will use John, Paul, Ringo, George as if they were just one composer, and not count duplicates separately for each person.

  • Related