r Calculation based on condition-CodePudding

I have a dataframe like :

Zone	year	value
Afghanistan	2013	8.6
Afghanistan	2018	11.1
Afrique du Sud	2013	2.2
Afrique du Sud	2018	3.3
Albanie	2013	0.1
Albanie	2018	0.1

and I would like to create a dataframe like:

Zone	Evol
Afghanistan	-4
Afrique du Sud	-50

etc.

Where Evol is equal to : round( ( (Value in 2013) - (Value in 2018) ) / Value in 2018 ) * 100)

probaly using while or substring?

Thank you very much in advance

CodePudding user response：

First you have to use pivot_wider on your dataframe to get a wide format, then you can summarise by zone

library(tidyverse)

ex = data.frame(zone = c("Afg","Afg","Afr","Afr","Alb","Alb"),
                year = c(2013, 2018, 2013, 2018, 2013, 2018),
                value = c(8.6, 11.1, 2.2, 3.3, 0.1, 0.1))

ex %>% 
  pivot_wider(names_from = year, values_from = value) %>% 
  group_by(zone) %>% 
  summarise(evol = (`2013` - `2018`) / `2018` * 100)

#> # A tibble: 3 x 2
#>   zone   evol
#> * <chr> <dbl>
#> 1 Afg   -22.5
#> 2 Afr   -33.3
#> 3 Alb     0

^{Created on 2021-12-10 by the reprex package (v0.3.0)}

CodePudding user response：

1) Sort group by Zone and perform the computation and ungroup.

library(dplyr)

DF %>% 
  group_by(Zone) %>% 
  summarize(Evol = round(100 * diff(-value) / last(value)), .groups = "drop")

giving:

# A tibble: 3 x 2
  Zone            Evol
  <chr>          <dbl>
1 Afghanistan      -23
2 Afrique du Sud   -33
3 Albanie            0

2) If the data can contain other rows then remove them and ensure that every group consists of 2013 and 2018 lines giving the same result. This is the same as above except we have added the ## lines. If year is double rather than integer then use 2013 and 2018 in place of 2013L and 2018L.

library(dplyr)

DF %>% 
  filter(year %in% c(2013L, 2018L)) %>% ##
  arrange(Zone, year) %>%  ##
  group_by(Zone) %>% 
  filter(identical(year, c(2013L, 2018L))) %>% ##
  summarize(Evol = round(100 * diff(-value) / last(value)), .groups = "drop")

3) If you wanted to use only base R then (1) can be done like this:

aggregate(list(Evol = DF$value), DF["Zone"], \(x) round(100 * diff(-x) / x[2]))

Note

The input in reproducible form is assumed to be the following. Note that we assume that year is integer.

DF <- 
structure(list(Zone = c("Afghanistan", "Afghanistan", "Afrique du Sud", 
"Afrique du Sud", "Albanie", "Albanie"), year = c(2013L, 2018L, 
2013L, 2018L, 2013L, 2018L), value = c(8.6, 11.1, 2.2, 3.3, 0.1, 
0.1)), class = "data.frame", row.names = c(NA, -6L))