Home > Software design >  Is there a way to omit NAs when using the dplyr package Lag function in R?
Is there a way to omit NAs when using the dplyr package Lag function in R?

Time:07-09

I have two columns of data and I want to subtract one from the previous row entry of the other. The lag() function in dplyr is perfect for this, but unfortunately I have NAs in the second row of data. This means that when calculating the new value for rows when the previous row has an NA in the second column is resulting in NA's (see below table):

Value 1 Value 2 New Value
NA 2 NA
13 3 11
6 NA 3
6 4 NA
5 4 1

Is there a way to omit then NAs and have it use the last actual value in the second column instead? Like an na.rm=TRUE that can be added? The result would look like:

Value 1 Value 2 New Value
NA 2 NA
13 3 11
6 NA 3
6 4 3
5 4 1

CodePudding user response:

One option would be to mutate Value 2 beforehand and fill the NA with the last non-NA value:

df %>%
  tidyr::fill(`Value 2`, .direction = "down")

CodePudding user response:

After making that New.Value, you can fill the NA with previous value using fill() from tidyr package.

library(dplyr)
library(tidyr)

df <- tibble::tribble(
  ~Value.1, ~Value.2, ~New.Value,
    NA, 2, NA,
   13,3,11,
   6,NA,3,
   6,4,NA,
   5,4,1,
  )

df

#> # A tibble: 5 × 3
#>   Value.1 Value.2 New.Value
#>     <dbl>   <dbl>     <dbl>
#> 1      NA       2        NA
#> 2      13       3        11
#> 3       6      NA         3
#> 4       6       4        NA
#> 5       5       4         1

df %>% 
  fill(New.Value, .direction = "down")

#> # A tibble: 5 × 3
#>   Value.1 Value.2 New.Value
#>     <dbl>   <dbl>     <dbl>
#> 1      NA       2        NA
#> 2      13       3        11
#> 3       6      NA         3
#> 4       6       4         3
#> 5       5       4         1

Created on 2022-07-08 by the reprex package (v2.0.1)

  • Related