Home > Mobile >  Create variable for percentage change between two values based on group
Create variable for percentage change between two values based on group

Time:12-20

I want to calculate the percentage difference based on groups in b (A,A;B,B;C,C) of a data.frame. The new variable rate should contain the change in c based on the time stamps a (i.e. 3 and 7, while the smaller number is always the previous). It should be NA, when there is just one time stamp for a given group.

I tried to use the common methods, but i always get a vector of zeros. Example data:

a <- c(3,7,3,7,3,7,3,3,7,3,7,3,7,7)
b <- c("a", "a", "b", "b", "c", "c", "d", "e", "e", "f","f", "g", "g", "h")
c <- runif(14, min=80, max=100)
df <- data.frame(a,b,c)

I tried the common approach with using group_by, mutate and lag i.e.

df %>% group_by(a,b) %>%
  mutate(rate = 100 * (c - lag(c))/lag(c))

I also tried arrange before using lag, but the result i get is always 0. I think its because there is not a time series which is why we cant use lag. There must be an easy way to achieve this with using regular functions.

CodePudding user response:

If you use lag() to define your rate, you will end up with "NA" for every b == 3, and the correct value for b == 7 when you group_by "b". Also, c() is a Primitive function in R and it's best not to use "c" as the name of a variable.

Is this the outcome you're after?

library(tidyverse)
a <- c(3,7,3,7,3,7,3,3,7,3,7,3,7,7)
b <- c("a", "a", "b", "b", "c", "c", "d", "e", "e", "f","f", "g", "g", "h")
d <- c(80, 100, runif(12, min=80, max=100))
df <- data.frame(a,b,d)

df %>% group_by(b) %>%
  mutate(rate = 100 * (d - lag(d, default = NA))/lag(d, default = NA))
#> # A tibble: 14 × 4
#> # Groups:   b [8]
#>        a b         d   rate
#>    <dbl> <chr> <dbl>  <dbl>
#>  1     3 a      80    NA   
#>  2     7 a     100    25   
#>  3     3 b      88.0  NA   
#>  4     7 b      91.1   3.54
#>  5     3 c      95.1  NA   
#>  6     7 c      82.7 -13.1 
#>  7     3 d      92.6  NA   
#>  8     3 e      84.1  NA   
#>  9     7 e      91.8   9.20
#> 10     3 f      81.9  NA   
#> 11     7 f      93.6  14.4 
#> 12     3 g      88.7  NA   
#> 13     7 g      80.6  -9.11
#> 14     7 h      99.2  NA

Created on 2021-12-20 by the reprex package (v2.0.1)

You can be more flexible with ifelse()'s too, e.g. if you want NA's for cases where you have a single group but zeros for cases where a == 3:

library(tidyverse)
a <- c(3,7,3,7,3,7,3,3,7,3,7,3,7,7)
b <- c("a", "a", "b", "b", "c", "c", "d", "e", "e", "f","f", "g", "g", "h")
d <- c(80, 100, runif(12, min=80, max=100))
df <- data.frame(a,b,d)

df %>% group_by(b) %>%
  mutate(group_number = n()) %>%
  mutate(rate = ifelse(group_number == 1, NA, ifelse(a == 7, 100 * (d - lag(d, default = NA))/lag(d, default = NA), 0))) %>%
  select(-group_number) %>%
  ungroup()
#> # A tibble: 14 × 4
#>        a b         d   rate
#>    <dbl> <chr> <dbl>  <dbl>
#>  1     3 a      80     0   
#>  2     7 a     100    25   
#>  3     3 b      95.8   0   
#>  4     7 b      83.9 -12.5 
#>  5     3 c      87.0   0   
#>  6     7 c      81.5  -6.26
#>  7     3 d      97.0  NA   
#>  8     3 e      99.1   0   
#>  9     7 e      82.6 -16.6 
#> 10     3 f      82.3   0   
#> 11     7 f      96.0  16.7 
#> 12     3 g      99.5   0   
#> 13     7 g      93.4  -6.09
#> 14     7 h      86.8  NA

Created on 2021-12-20 by the reprex package (v2.0.1)

  • Related