Home > Software design >  r Calculation based on condition
r Calculation based on condition

Time:12-11

I have a dataframe like :

Zone year value
Afghanistan 2013 8.6
Afghanistan 2018 11.1
Afrique du Sud 2013 2.2
Afrique du Sud 2018 3.3
Albanie 2013 0.1
Albanie 2018 0.1

and I would like to create a dataframe like:

Zone Evol
Afghanistan -4
Afrique du Sud -50

etc.

Where Evol is equal to : round( ( (Value in 2013) - (Value in 2018) ) / Value in 2018 ) * 100)

probaly using while or substring?

Thank you very much in advance

CodePudding user response:

First you have to use pivot_wider on your dataframe to get a wide format, then you can summarise by zone

library(tidyverse)

ex = data.frame(zone = c("Afg","Afg","Afr","Afr","Alb","Alb"),
                year = c(2013, 2018, 2013, 2018, 2013, 2018),
                value = c(8.6, 11.1, 2.2, 3.3, 0.1, 0.1))

ex %>% 
  pivot_wider(names_from = year, values_from = value) %>% 
  group_by(zone) %>% 
  summarise(evol = (`2013` - `2018`) / `2018` * 100)

#> # A tibble: 3 x 2
#>   zone   evol
#> * <chr> <dbl>
#> 1 Afg   -22.5
#> 2 Afr   -33.3
#> 3 Alb     0

Created on 2021-12-10 by the reprex package (v0.3.0)

CodePudding user response:

1) Sort group by Zone and perform the computation and ungroup.

library(dplyr)

DF %>% 
  group_by(Zone) %>% 
  summarize(Evol = round(100 * diff(-value) / last(value)), .groups = "drop")

giving:

# A tibble: 3 x 2
  Zone            Evol
  <chr>          <dbl>
1 Afghanistan      -23
2 Afrique du Sud   -33
3 Albanie            0

2) If the data can contain other rows then remove them and ensure that every group consists of 2013 and 2018 lines giving the same result. This is the same as above except we have added the ## lines. If year is double rather than integer then use 2013 and 2018 in place of 2013L and 2018L.

library(dplyr)

DF %>% 
  filter(year %in% c(2013L, 2018L)) %>% ##
  arrange(Zone, year) %>%  ##
  group_by(Zone) %>% 
  filter(identical(year, c(2013L, 2018L))) %>% ##
  summarize(Evol = round(100 * diff(-value) / last(value)), .groups = "drop")

3) If you wanted to use only base R then (1) can be done like this:

aggregate(list(Evol = DF$value), DF["Zone"], \(x) round(100 * diff(-x) / x[2]))

Note

The input in reproducible form is assumed to be the following. Note that we assume that year is integer.

DF <- 
structure(list(Zone = c("Afghanistan", "Afghanistan", "Afrique du Sud", 
"Afrique du Sud", "Albanie", "Albanie"), year = c(2013L, 2018L, 
2013L, 2018L, 2013L, 2018L), value = c(8.6, 11.1, 2.2, 3.3, 0.1, 
0.1)), class = "data.frame", row.names = c(NA, -6L))
  • Related