I currently work with multiple large datasets of the same row number but different column numbers. Now I need to calculate the rate of change between columns and add it to either a new object or to the existing object to go on with my analysis.
In my research on the web I usually only encounterd people trying to figure out rate of change in a column but not between those. Is the easiest way to just flip all my data?
I am very sorry for my vague description of my problem as R and english are not my first languages.
I hope you can still show me the direction to further my understanding of R.
Thank you in advance for any tipps you might have!
CodePudding user response:
I recommend joining all the data together and then convert it into a 3NF normalized long format table:
library(tidyverse)
data1 <- tibble(
country = c("A", "B", "C"),
gdp_2020 = c(1, 8, 10),
gdp_2021 = c(1, 8, 10),
population_2010 = c(5e3, 6e3, 6e3),
population_2020 = c(5.5e3, 6.8e3, 6e3)
)
data1
#> # A tibble: 3 x 5
#> country gdp_2020 gdp_2021 population_2010 population_2020
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 A 1 1 5000 5500
#> 2 B 8 8 6000 6800
#> 3 C 10 10 6000 6000
data2 <- tibble(
country = c("A", "B", "C"),
population_2021 = c(7e3, 8e3, 7e3),
population_2022 = c(7e3, 7e3, 10e3)
)
data2
#> # A tibble: 3 x 3
#> country population_2021 population_2022
#> <chr> <dbl> <dbl>
#> 1 A 7000 7000
#> 2 B 8000 7000
#> 3 C 7000 10000
list(
data1,
data2
) %>%
reduce(full_join) %>%
pivot_longer(matches("^(gdp|population)")) %>%
separate(name, into = c("variable", "year"), sep = "_") %>%
type_convert() %>%
arrange(country, variable, year) %>%
group_by(variable, country) %>%
mutate(
# NA for the first value because it does not have a precursor to calculate change
change_rate = (value - lag(value)) / (year - lag(year))
)
#> Joining, by = "country"
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> country = col_character(),
#> variable = col_character(),
#> year = col_double()
#> )
#> # A tibble: 18 x 5
#> # Groups: variable, country [6]
#> country variable year value change_rate
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 A gdp 2020 1 NA
#> 2 A gdp 2021 1 0
#> 3 A population 2010 5000 NA
#> 4 A population 2020 5500 50
#> 5 A population 2021 7000 1500
#> 6 A population 2022 7000 0
#> 7 B gdp 2020 8 NA
#> 8 B gdp 2021 8 0
#> 9 B population 2010 6000 NA
#> 10 B population 2020 6800 80
#> 11 B population 2021 8000 1200
#> 12 B population 2022 7000 -1000
#> 13 C gdp 2020 10 NA
#> 14 C gdp 2021 10 0
#> 15 C population 2010 6000 NA
#> 16 C population 2020 6000 0
#> 17 C population 2021 7000 1000
#> 18 C population 2022 10000 3000
Created on 2021-12-16 by the reprex package (v2.0.1)
Example: rate of change in the second row (gdp of country A) is 0 because it was the same in both 2021 and 2020.