Home > Software engineering >  Displaying only those records which contains all the elements in a Superset for a specific column in
Displaying only those records which contains all the elements in a Superset for a specific column in

Time:12-21

Introduction

I have a database with 3 columns: -

  1. Country
  2. Year
  3. Electricity Generation

Sample table values for your reference: -

Country Year Electricity Generation (TWh)
Afghanistan 2000 0.47
Afghanistan 2001 0.59
. . .
. . .
Afghanistan 2020 0.78
Australia 1985 123.673
Australia 1986 129.527
. . .
. . .
Australia 2021 267.45227

The Year column superset has the year values between 1985 and 2021.

Problem Statement

I want to see the result set of only those countries which has complete data i.e. countries which has the data from the year 1985 till 2021.

I am facing three types of abnormalities: -

  1. Some country's Year value start from the year later than 1985.
  2. Some country's Year value end at the year earlier than 2021.
  3. Some country's Year value start from the year 1985 and end at 2021, but has missing year values in between like the data for the year 2001, 2005 etc. are missing.

Desired Solution

A SQL Query which displays the data for only those countries which has all the Year column values (i.e. in the set of values from 1985 till 2021)

The resulting table should look something like this: -

Country Year Electricity Generation (TWh)
Australia 1985 123.673
Australia 1986 129.527
. . .
. . .
Australia 2021 267.45227
USA 1985 2657.1501
USA 1986 2676.1130
. . .
. . .
USA 2021 4406.4130

SQL Query Trial

The following MySQL query is what I have written till now: -

SELECT * 
FROM meta_generation
WHERE `Year` BETWEEN 1985 AND 2021
GROUP BY `Country`, `Year`;

Naturally the above query also gives me the value for the countries who suffer from the abnormalities as stated above. Please help in how to achieve my desired result set.

Followup Question

The answer provided by @FlexYourData is correct one, but I tried to modify the query in another format which gave me the following error: -

Error Code: 1038. Out of sort memory, consider increasing server sort buffer size.

This is my modified query: -

    SELECT 
        *
    FROM 
        meta_generation
    WHERE 
        `Country` IN 
            (SELECT 
                `Country`
             FROM 
                 meta_generation
             WHERE 
                 `Year` BETWEEN 1985 AND 2021
             GROUP BY 
                 `Country`
             HAVING 
                  COUNT(DISTINCT Year) = 37);

Why did this happen? Is using Nested Queries not a good practice?

CodePudding user response:

You need to count distinct years per country, then select only those countries with (2021-1985 1) = 37 years.

So,

WITH countries
AS
(
SELECT Country
FROM meta_generation
WHERE Year BETWEEN 1985 AND 2021
GROUP BY Country
HAVING COUNT(DISTINCT Year) = 37
)
SELECT m.*
FROM meta_generation m 
INNER JOIN countries c ON m.Country = c.Country;
  • Related