This is more of an extension to a question from earlier I was trying to work out the percentage change of variables between years. I now wish to extend this by doing it so I can see the change in countries through the years. Some sample data and code included below.
This code works out % change used, but I need to add in counties to this. Think some sort of group_by()
on the county names might work here, although can't think of ways to use it here
df |>
mutate(pdiff = 100*(gdp_per_capita - lag(gdp_per_capita))/gdp_per_capita)
data sample
df
Country_Name year gdp_per_capita
Albania 2018 5287.664
Albania 2019 5396.216
Albania 2020 5332.160
Albania 2021 6494.386
Algeria 2018 4142.019
Algeria 2019 3989.668
Algeria 2020 3306.858
Algeria 2021 3765.035
Bosnia 2018 6070.353
Bosnia 2019 6119.762
Bosnia 2020 6082.367
Bosnia 2021 6916.438
Croatia 2018 15227.560
Croatia 2019 15311.767
Croatia 2020 14132.487
Croatia 2021 17398.766
CodePudding user response:
It could be done by adding the group_by
library(dplyr)
df |>
group_by(Country_Name) |>
mutate(pdiff = 100*(gdp_per_capita - lag(gdp_per_capita))/gdp_per_capita) |>
ungroup()
CodePudding user response:
I tried diff
instead of using lag
, but basically you need to group by country and do the same calculations. I dropped arrange
in there to be sure about the order of the data.
library(dplyr)
df %>%
group_by(Country_Name) %>%
arrange(Country_Name, year) %>%
mutate(pdiff = 100 * c(NA, diff(gdp_per_capita) / gdp_per_capita[-1]))
# mutate(pdiff = 100 * c(NA, diff(gdp_per_capita) / gdp_per_capita[-n()]))
## better to divide by previous row, i.e. lag(gdp_per_capita)
#> # A tibble: 16 x 4
#> # Groups: Country_Name [4]
#> Country_Name year gdp_per_capita pdiff
#> <chr> <int> <dbl> <dbl>
#> 1 Albania 2018 5288. NA
#> 2 Albania 2019 5396. 2.01
#> 3 Albania 2020 5332. -1.20
#> 4 Albania 2021 6494. 17.9
#> 5 Algeria 2018 4142. NA
#> 6 Algeria 2019 3990. -3.82
#> 7 Algeria 2020 3307. -20.6
#> 8 Algeria 2021 3765. 12.2
#> 9 Bosnia 2018 6070. NA
#> 10 Bosnia 2019 6120. 0.807
#> 11 Bosnia 2020 6082. -0.615
#> 12 Bosnia 2021 6916. 12.1
#> 13 Croatia 2018 15228. NA
#> 14 Croatia 2019 15312. 0.550
#> 15 Croatia 2020 14132. -8.34
#> 16 Croatia 2021 17399. 18.8
Data:
df <- read.table(text = "Country_Name year gdp_per_capita
Albania 2018 5287.664
Albania 2019 5396.216
Albania 2020 5332.160
Albania 2021 6494.386
Algeria 2018 4142.019
Algeria 2019 3989.668
Algeria 2020 3306.858
Algeria 2021 3765.035
Bosnia 2018 6070.353
Bosnia 2019 6119.762
Bosnia 2020 6082.367
Bosnia 2021 6916.438
Croatia 2018 15227.560
Croatia 2019 15311.767
Croatia 2020 14132.487
Croatia 2021 17398.766", header = T, stringsAsFactor = F)