Home > OS >  "Lead" function in R not working correctly
"Lead" function in R not working correctly

Time:05-01

I am trying to use the lead function on R, where basically the last value in val.test corresponding to the group var becomes the first value in val.test.lead

set.seed(14)
df <- data.frame(
  var = c("A","B","C","D","E","F","G","H","I","J","K","L"),
  val.test = rnorm(12,4,5)
)

df$var <- as.factor(df$var)

df <- df %>% 
  dplyr::group_by(var) %>% 
  dplyr::mutate(val.test.lead = lead(val.test, default = first(val.test)))

#The output is

  var   val.test val.test.lead
   <fct>    <dbl>         <dbl>
 1 A        0.691         0.691
 2 B       12.6          12.6  
 3 C       14.6          14.6  
 4 D       11.5          11.5  
 5 E        3.82          3.82 
 6 F       10.2          10.2  
 7 G        3.68          3.68 
 8 H        9.34          9.34 
 9 I        2.12          2.12 
10 J        9.22          9.22 
11 K        2.09          2.09 
12 L        5.50          5.50 

#The expected output is

  var   val.test val.test.lead
   <fct>    <dbl>         <dbl>
 1 A        0.691         5.50
 2 B       12.6           0.691 
 3 C       14.6           12.6
 4 D       11.5           14.6 
 5 E        3.82          11.5
 6 F       10.2           3.82 
 7 G        3.68          10.2
 8 H        9.34          3.68  
 9 I        2.12          9.34 
10 J        9.22          2.12 
11 K        2.09          9.22 
12 L        5.50          2.09 

CodePudding user response:

This looks like that you need lag() instead of lead() (and I don't understand your group_by() there?)

df %>% 
  dplyr::mutate(val.test.lead = dplyr::lag(val.test, default = last(val.test)))

   var val.test val.test.lead
1    A     0.69          5.50
2    B    12.59          0.69
3    C    14.61         12.59
4    D    11.49         14.61
5    E     3.82         11.49
6    F    10.16          3.82
7    G     3.68         10.16
8    H     9.34          3.68
9    I     2.12          9.34
10   J     9.22          2.12
11   K     2.09          9.22
12   L     5.50          2.09

CodePudding user response:

What you looking for is not lead, but lag.
comparing lead to lag,

set.seed(14)

df <- data.frame(
  var = c("A","B","C","D","E","F","G","H","I","J","K","L"),
  val.test = rnorm(12,4,5)
) %>% tibble()
df$var <- as.factor(df$var)
df %>% mutate(val.test.lag = lag(val.test, default=last(val.test)),
  val.test.lead = lead(val.test,default = first(val.test)))

Output

 # A tibble: 12 × 4
    var   val.test val.test.lag val.test.lead
    <fct>    <dbl>        <dbl>         <dbl>
  1 A        0.691        5.50         12.6  
  2 B       12.6          0.691        14.6  
  3 C       14.6         12.6          11.5  
  4 D       11.5         14.6           3.82 
  5 E        3.82        11.5          10.2  
  6 F       10.2          3.82          3.68 
  7 G        3.68        10.2           9.34 
  8 H        9.34         3.68          2.12 
  9 I        2.12         9.34          9.22 
 10 J        9.22         2.12          2.09 
 11 K        2.09         9.22          5.50 
 12 L        5.50         2.09          0.691

Created on 2022-04-30 by the reprex package (v2.0.1)

  • Related