I'm trying to understand how many new vaccinations were in every country and ending up with the same date 3 times which messes with the calculations because it added everything together
Select dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations, Sum(convert(BIGINT, vac.new_vaccinations))
OVER (Partition by dea.location ORDER by dea.location, dea.Date) as RollingPeopleVaccinated
FROM Portfolio_Project..covid_death_please_work$ as dea
JOIN Portfolio_Project..covid_vacsenation$ as vac
on dea.location = vac.location
and dea.date = vac.date
Where dea.continent is not null and vac.new_vaccinations is not null
order by 2,3
this is what i get:
Asia Afghanistan 2021-05-27 00:00:00.000 39835428 2859 8577
Asia Afghanistan 2021-05-27 00:00:00.000 39835428 2859 8577
Asia Afghanistan 2021-05-27 00:00:00.000 39835428 2859 8577
Asia Afghanistan 2021-06-03 00:00:00.000 39835428 4015 20622
Asia Afghanistan 2021-06-03 00:00:00.000 39835428 4015 20622
Asia Afghanistan 2021-06-03 00:00:00.000 39835428 4015 20622
Asia Afghanistan 2022-01-27 00:00:00.000 39835428 6868 41226
CodePudding user response:
try this:
Select distinct dea.continent, dea.location, dea.date, dea.population,
vac.new_vaccinations, Sum(convert(BIGINT, vac.new_vaccinations))
OVER (Partition by dea.location ORDER by dea.location, dea.Date) as
RollingPeopleVaccinated
FROM Portfolio_Project..covid_death_please_work$ as dea
JOIN Portfolio_Project..covid_vacsenation$ as vac
on dea.location = vac.location
and dea.date = vac.date
Where dea.continent is not null and vac.new_vaccinations is not null
order by 2,3
CodePudding user response:
Try to sort inside the window only by date. As I understand it, you need a cumulative total by date?
SELECT
dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
SUM(convert(BIGINT, vac.new_vaccinations))
OVER (PARTITION BY dea.location ORDER dea.Date) AS RollingPeopleVaccinated
FROM Portfolio_Project..covid_death_please_work$ as dea
JOIN Portfolio_Project..covid_vacsenation$ as vac
ON dea.location = vac.location
AND dea.date = vac.date
WHERE dea.continent is not null AND vac.new_vaccinations is not null
ORDER BY 2,3
UPDATE:
Also, there is a suspicion that you get duplicated data in a query with a JOIN.
Try like this
SELECT
t.continent, t.location, t.date, t.population, t.new_vaccinations,
SUM(convert(BIGINT, t.new_vaccinations))
OVER (PARTITION BY t.location ORDER t.date) AS RollingPeopleVaccinated
FROM (
SELECT DISTINCT
dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
FROM Portfolio_Project..covid_death_please_work$ as dea
JOIN Portfolio_Project..covid_vacsenation$ as vac
ON dea.location = vac.location
AND dea.date = vac.date
WHERE dea.continent is not null AND vac.new_vaccinations is not null
) t