I have a dataframe like :
Zone | year | value |
---|---|---|
Afghanistan | 2013 | 8.6 |
Afghanistan | 2018 | 11.1 |
Afrique du Sud | 2013 | 2.2 |
Afrique du Sud | 2018 | 3.3 |
Albanie | 2013 | 0.1 |
Albanie | 2018 | 0.1 |
and I would like to create a dataframe like:
Zone | Evol |
---|---|
Afghanistan | -4 |
Afrique du Sud | -50 |
etc.
Where Evol is equal to : round( ( (Value in 2013) - (Value in 2018) ) / Value in 2018 ) * 100)
probaly using while or substring?
Thank you very much in advance
CodePudding user response:
First you have to use pivot_wider
on your dataframe to get a wide format, then you can summarise
by zone
library(tidyverse)
ex = data.frame(zone = c("Afg","Afg","Afr","Afr","Alb","Alb"),
year = c(2013, 2018, 2013, 2018, 2013, 2018),
value = c(8.6, 11.1, 2.2, 3.3, 0.1, 0.1))
ex %>%
pivot_wider(names_from = year, values_from = value) %>%
group_by(zone) %>%
summarise(evol = (`2013` - `2018`) / `2018` * 100)
#> # A tibble: 3 x 2
#> zone evol
#> * <chr> <dbl>
#> 1 Afg -22.5
#> 2 Afr -33.3
#> 3 Alb 0
Created on 2021-12-10 by the reprex package (v0.3.0)
CodePudding user response:
1) Sort group by Zone and perform the computation and ungroup.
library(dplyr)
DF %>%
group_by(Zone) %>%
summarize(Evol = round(100 * diff(-value) / last(value)), .groups = "drop")
giving:
# A tibble: 3 x 2
Zone Evol
<chr> <dbl>
1 Afghanistan -23
2 Afrique du Sud -33
3 Albanie 0
2) If the data can contain other rows then remove them and ensure that every group consists of 2013 and 2018 lines giving the same result. This is the same as above except we have added the ## lines. If year is double rather than integer then use 2013 and 2018 in place of 2013L and 2018L.
library(dplyr)
DF %>%
filter(year %in% c(2013L, 2018L)) %>% ##
arrange(Zone, year) %>% ##
group_by(Zone) %>%
filter(identical(year, c(2013L, 2018L))) %>% ##
summarize(Evol = round(100 * diff(-value) / last(value)), .groups = "drop")
3) If you wanted to use only base R then (1) can be done like this:
aggregate(list(Evol = DF$value), DF["Zone"], \(x) round(100 * diff(-x) / x[2]))
Note
The input in reproducible form is assumed to be the following. Note that we assume that year is integer.
DF <-
structure(list(Zone = c("Afghanistan", "Afghanistan", "Afrique du Sud",
"Afrique du Sud", "Albanie", "Albanie"), year = c(2013L, 2018L,
2013L, 2018L, 2013L, 2018L), value = c(8.6, 11.1, 2.2, 3.3, 0.1,
0.1)), class = "data.frame", row.names = c(NA, -6L))