Introduction
I have a database with 3 columns: -
Country
Year
Electricity Generation
Sample table values for your reference: -
Country | Year | Electricity Generation (TWh) |
---|---|---|
Afghanistan | 2000 | 0.47 |
Afghanistan | 2001 | 0.59 |
. | . | . |
. | . | . |
Afghanistan | 2020 | 0.78 |
Australia | 1985 | 123.673 |
Australia | 1986 | 129.527 |
. | . | . |
. | . | . |
Australia | 2021 | 267.45227 |
The Year
column superset has the year values between 1985 and 2021.
Problem Statement
I want to see the result set of only those countries which has complete data i.e. countries which has the data from the year 1985 till 2021.
I am facing three types of abnormalities: -
- Some country's
Year
value start from the year later than 1985. - Some country's
Year
value end at the year earlier than 2021. - Some country's
Year
value start from the year 1985 and end at 2021, but has missing year values in between like the data for the year 2001, 2005 etc. are missing.
Desired Solution
A SQL Query which displays the data for only those countries which has all the Year
column values (i.e. in the set of values from 1985 till 2021)
The resulting table should look something like this: -
Country | Year | Electricity Generation (TWh) |
---|---|---|
Australia | 1985 | 123.673 |
Australia | 1986 | 129.527 |
. | . | . |
. | . | . |
Australia | 2021 | 267.45227 |
USA | 1985 | 2657.1501 |
USA | 1986 | 2676.1130 |
. | . | . |
. | . | . |
USA | 2021 | 4406.4130 |
SQL Query Trial
The following MySQL query is what I have written till now: -
SELECT *
FROM meta_generation
WHERE `Year` BETWEEN 1985 AND 2021
GROUP BY `Country`, `Year`;
Naturally the above query also gives me the value for the countries who suffer from the abnormalities as stated above. Please help in how to achieve my desired result set.
Followup Question
The answer provided by @FlexYourData is correct one, but I tried to modify the query in another format which gave me the following error: -
Error Code: 1038. Out of sort memory, consider increasing server sort buffer size.
This is my modified query: -
SELECT
*
FROM
meta_generation
WHERE
`Country` IN
(SELECT
`Country`
FROM
meta_generation
WHERE
`Year` BETWEEN 1985 AND 2021
GROUP BY
`Country`
HAVING
COUNT(DISTINCT Year) = 37);
Why did this happen? Is using Nested Queries not a good practice?
CodePudding user response:
You need to count distinct years per country, then select only those countries with (2021-1985 1) = 37
years.
So,
WITH countries
AS
(
SELECT Country
FROM meta_generation
WHERE Year BETWEEN 1985 AND 2021
GROUP BY Country
HAVING COUNT(DISTINCT Year) = 37
)
SELECT m.*
FROM meta_generation m
INNER JOIN countries c ON m.Country = c.Country;