Home > database >  SQL - How do I calculate the sum only for distinct rows of another column?
SQL - How do I calculate the sum only for distinct rows of another column?

Time:02-26

I have a table with covid data containing a lot of different data entries. Each row has the Country, Continent, Population and different data which is not relevant for this. I want to simply calculate the population per CONTINENT, but I can't figure out how.

Continent Country Population ...
Europe Bulgaria 690.000
Europe Bulgaria 690.000
Europe Bulgaria 690.000
America Brazil 212.000.000
America Brazil 212.000.000
America Brazil 212.000.000
America Brazil 212.000.000
... ... ...
SELECT distinct country, continent, SUM(population) as TotalPopulation
FROM PortfolioProject..CovidDeaths
GROUP by continent

This is what I tried before but it produces an error: Column 'PortfolioProject..CovidDeaths.location' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

And this code snippet sums up each row of a country. Therefore the continent populations are ridiculously high.

SELECT continent, SUM(population) as TotalPopulation
From PortfolioProject..CovidDeaths
GROUP by continent

Can somebody help me?

CodePudding user response:

That data probably also contains a column that indicates when the population total was reported.

In that case you'd want the latest reported population totals for the countries.

So let's assume there's such a date column named "ReportedAt"

Then you can use ROW_NUMBER to get the latest record per country.
And then aggregate for the continent.

SELECT continent, SUM(population) AS TotalPopulation
FROM
(
  SELECT country, continent, population
  , rn = ROW_NUMBER() OVER (PARTITION BY country ORDER BY ReportedAt DESC) 
  FROM PortfolioProject..CovidDeaths
) q
WHERE rn = 1
GROUP BY continent
ORDER BY continent;
  • Related