Home > Software design >  Group by cause double records
Group by cause double records

Time:08-04

I have TWO table:

TABLE ALBUM

id AlbumDescription
1 Siamo Qui
2 Safari
3 Perfetto

TABLE SONG

id idAlbum SongDescription Duration
1 1 AlbaChiara 30
2 1 Gabri 40
3 2 ombellico 50
4 2 I love you baby 10
5 3 cose della vita 60
6 3 aurora 20

The tables are linked via Song.idAlbum and Album.id

Now I have to make a query that extrapolates the song that lasts longer for each album, displaying the name of the song as well

By doing this query I have duplicate records, what is the best approach to eliminate duplicate records?

SELECT MAX(Duration) as Duration
,A.AlbumDescription
,C.SongDescription
FROM Album A
left JOIN Canzone C ON A.id = C.idAlbum
GROUP BY A.AlbumDescription, C.SongDescription

Result:

Duration AlbumDescription SongDescription
30 Siamo Qui AlbaChiara
60 Perfetto Cose della vita
40 Siamo Qui Gabri
10 Safari I love you baby
20 Perfetto aurora
50 Safari ombellico

What would i expect

Duration AlbumDescription SongDescription
60 Perfetto Cose della vita
40 Siamo Qui Gabri
50 Safari ombellico

CodePudding user response:

We can use RANK() here:

WITH cte AS (
    SELECT a.AlbumDescription, c.SongDescription, c.Duration,
           RANK() OVER (PARTITION BY a.id ORDER BY c.Duration DESC) rnk
    FROM Album a
    INNER JOIN Canzone c ON c.idAlbum = a.id
)

SELECT Duration, AlbumDescription, SongDescription
FROM cte
WHERE rnk = 1;

Note that if you don't expect or care about the possibility for two or more songs on the same album having the same maximum length, then you may also use ROW_NUMBER above instead of RANK. As written, the above would return all songs on a given album tied for being the longest.

We can write a shorter version of the above using a TOP 1 WITH TIES trick:

SELECT TOP 1 WITH TIES a.AlbumDescription, c.SongDescription, c.Duration
FROM Album a
INNER JOIN Canzone c ON c.idAlbum = a.id
ORDER BY RANK() OVER (PARTITION BY a.id ORDER BY c.Duration DESC);

But note that this probably will not perform as well as the first version, and it should only be used when you don't care much about performance but more about code brevity.

CodePudding user response:

When you use group by with both AlbumDescription, SongDescription it makes 2 groups. Because (Siamo Qui,AlbaChiara) and (Siamo Qui,Gabri) are different. As you want to find the song with the longest duration you should group it by AlbumDescription. You might want to use album id.

SELECT MAX(Duration) as Duration
,A.AlbumDescription
,C.SongDescription
FROM Album A
left JOIN song C ON A.id = C.idAlbum
GROUP BY A.AlbumDescription;

this query returns:

Duration AlbumDescription SongDescription
60 Perfetto Cose della vita
40 Siamo Qui Gabri
50 Safari ombellico

Or you can use

SELECT FROM album as A 
LEFT JOIN song AS S on a.id = s.idAlbum 
WHERE S.duration == (SELECT max(duration) 
                     FROM song WHERE idAlbum = S.idAlbum);

both return the same result.

These are the table in data in case anyone wants to try;

CREATE TABLE album
(
 id INTEGER,
 albumDescription TEXT
);

CREATE TABLE song
(
  id INTEGER,
  idAlbum INTEGER REFERENCES album(id),
  songDescription TEXT,
  duration INTEGER
);

INSERT INTO album VALUES
(1, 'Siamo Qui'),(2, 'Safari'),(3, 'Perfetto');

INSERT INTO song VALUES
(1,1, 'AlbaChiara',30),
(2,1, 'Gabri',40),
(3,2, 'ombellico',50),
(4,2, 'I love you baby',10),
(5,3, 'cose della vita',60),
(6,3, 'aurora',20);

CodePudding user response:

You can also put aggregation on sone description like this:

SELECT MAX(Duration) as Duration
,A.AlbumDescription
,MAX(C.SongDescription) as SongDescription
FROM Album A
left JOIN Canzone C ON A.id = C.idAlbum
GROUP BY A.AlbumDescription
  • Related