I have this table and I want to calculate inflation for every month in every city
I tried the lag and lead functions but the CPI was overlapping onto the next city for ex : Malmo's shifted CPI was Stockholm's 05-01-2005 CPI
CPI | Date | City |
---|---|---|
112 | 2005-01-01 | Stockholm |
113.5 | 2005-02-01 | Stockholm |
115 | 2005-03-01 | Stockholm |
115.6 | 2005-04-01 | Stockholm |
115.8 | 2005-05-01 | Stockholm |
106 | 2005-01-01 | Malmo |
107.5 | 2005--02-01 | Malmo |
110 | 2005-03-01 | Malmo |
113 | 2005-04-01 | Malmo |
117 | 2005-05-01 | Malmo |
CodePudding user response:
You need to group_by
to avoid this issue.
# Read in data
inflation <- read.table(text = "CPI Date City
112 2005-01-01 Stockholm
113.5 2005-02-01 Stockholm
115 2005-03-01 Stockholm
115.6 2005-04-01 Stockholm
115.8 2005-05-01 Stockholm
106 2005-01-01 Malmo
107.5 2005--02-01 Malmo
110 2005-03-01 Malmo
113 2005-04-01 Malmo
117 2005-05-01 Malmo", h = T)
# Perform calculation
library(dplyr)
inflation |>
group_by(City) |>
mutate(
cpi_change = lead(CPI) - CPI,
cpi_change_percent = cpi_change / CPI * 100
)
Output:
# A tibble: 10 x 5
# # Groups: City [2]
# CPI Date City cpi_change cpi_change_percent
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 112 2005-01-01 Stockholm 1.5 1.34
# 2 114. 2005-02-01 Stockholm 1.5 1.32
# 3 115 2005-03-01 Stockholm 0.600 0.522
# 4 116. 2005-04-01 Stockholm 0.200 0.173
# 5 116. 2005-05-01 Stockholm NA NA
# 6 106 2005-01-01 Malmo 1.5 1.42
# 7 108. 2005--02-01 Malmo 2.5 2.33
# 8 110 2005-03-01 Malmo 3 2.73
# 9 113 2005-04-01 Malmo 4 3.54
# 10 117 2005-05-01 Malmo NA NA
You will get NAs for the last month as we do not know the rate in the following year. Alternatively you can do it with lag
instead of lead
if you want to work out change from previous, but then you'll get NAs for the first month.