Home > other >  SQL shows the same row tree times
SQL shows the same row tree times

Time:03-13

I'm trying to understand how many new vaccinations were in every country and ending up with the same date 3 times which messes with the calculations because it added everything together

    Select dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations, Sum(convert(BIGINT, vac.new_vaccinations)) 
OVER (Partition by dea.location ORDER by dea.location, dea.Date) as RollingPeopleVaccinated
FROM Portfolio_Project..covid_death_please_work$ as dea
JOIN Portfolio_Project..covid_vacsenation$ as vac
    on dea.location = vac.location
    and dea.date = vac.date
Where  dea.continent is not null and vac.new_vaccinations is not null
order by 2,3

this is what i get:

Asia    Afghanistan 2021-05-27 00:00:00.000 39835428    2859    8577
Asia    Afghanistan 2021-05-27 00:00:00.000 39835428    2859    8577
Asia    Afghanistan 2021-05-27 00:00:00.000 39835428    2859    8577
Asia    Afghanistan 2021-06-03 00:00:00.000 39835428    4015    20622
Asia    Afghanistan 2021-06-03 00:00:00.000 39835428    4015    20622
Asia    Afghanistan 2021-06-03 00:00:00.000 39835428    4015    20622
Asia    Afghanistan 2022-01-27 00:00:00.000 39835428    6868    41226

CodePudding user response:

try this:

Select distinct dea.continent, dea.location, dea.date, dea.population, 
vac.new_vaccinations, Sum(convert(BIGINT, vac.new_vaccinations)) 
OVER (Partition by dea.location ORDER by dea.location, dea.Date) as 
RollingPeopleVaccinated
FROM Portfolio_Project..covid_death_please_work$ as dea
JOIN Portfolio_Project..covid_vacsenation$ as vac
on dea.location = vac.location
and dea.date = vac.date
Where  dea.continent is not null and vac.new_vaccinations is not null
order by 2,3

CodePudding user response:

Try to sort inside the window only by date. As I understand it, you need a cumulative total by date?

SELECT
    dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
    SUM(convert(BIGINT, vac.new_vaccinations))
      OVER (PARTITION BY dea.location ORDER dea.Date) AS RollingPeopleVaccinated
  FROM Portfolio_Project..covid_death_please_work$ as dea
  JOIN Portfolio_Project..covid_vacsenation$ as vac
    ON dea.location = vac.location
    AND dea.date = vac.date
  WHERE  dea.continent is not null AND vac.new_vaccinations is not null
  ORDER BY 2,3

UPDATE:

Also, there is a suspicion that you get duplicated data in a query with a JOIN.

Try like this

SELECT
    t.continent, t.location, t.date, t.population, t.new_vaccinations,
    SUM(convert(BIGINT, t.new_vaccinations))
      OVER (PARTITION BY t.location ORDER t.date) AS RollingPeopleVaccinated
  FROM (
    SELECT DISTINCT
        dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
      FROM Portfolio_Project..covid_death_please_work$ as dea
      JOIN Portfolio_Project..covid_vacsenation$ as vac
        ON dea.location = vac.location
        AND dea.date = vac.date
      WHERE  dea.continent is not null AND vac.new_vaccinations is not null
  ) t
  •  Tags:  
  • sql
  • Related