I have a table with covid data containing a lot of different data entries. Each row has the Country, Continent, Population and different data which is not relevant for this. I want to simply calculate the population per CONTINENT, but I can't figure out how.
Continent | Country | Population | ... |
---|---|---|---|
Europe | Bulgaria | 690.000 | |
Europe | Bulgaria | 690.000 | |
Europe | Bulgaria | 690.000 | |
America | Brazil | 212.000.000 | |
America | Brazil | 212.000.000 | |
America | Brazil | 212.000.000 | |
America | Brazil | 212.000.000 | |
... | ... | ... |
SELECT distinct country, continent, SUM(population) as TotalPopulation
FROM PortfolioProject..CovidDeaths
GROUP by continent
This is what I tried before but it produces an error: Column 'PortfolioProject..CovidDeaths.location' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
And this code snippet sums up each row of a country. Therefore the continent populations are ridiculously high.
SELECT continent, SUM(population) as TotalPopulation
From PortfolioProject..CovidDeaths
GROUP by continent
Can somebody help me?
CodePudding user response:
That data probably also contains a column that indicates when the population total was reported.
In that case you'd want the latest reported population totals for the countries.
So let's assume there's such a date column named "ReportedAt"
Then you can use ROW_NUMBER to get the latest record per country.
And then aggregate for the continent.
SELECT continent, SUM(population) AS TotalPopulation
FROM
(
SELECT country, continent, population
, rn = ROW_NUMBER() OVER (PARTITION BY country ORDER BY ReportedAt DESC)
FROM PortfolioProject..CovidDeaths
) q
WHERE rn = 1
GROUP BY continent
ORDER BY continent;