Home > Net >  Fill in NAs using mutate
Fill in NAs using mutate

Time:09-21

I have the following data frame:

library(tidyverse)

v1 = c(1 , NA, 3, 5, NA, NA, 2, 12, NA, 5, NA, 0, 1, 2, 6, 8)
alt = rnorm(length(v1), 0, 1)
tb = tibble(v1, alt)

print(tb)

A tibble: 16 × 2
      v1     alt
   <dbl>   <dbl>
 1     1  0.495 
 2    NA  0.726 
 3     3  0.667 
 4     5  0.955 
 5    NA -1.68  
 6    NA -1.21  
 7     2 -1.96  
 8    12  1.47  
 9    NA  0.372 
10     5  1.07  
11    NA  0.531 
12     0  0.102 
13     1  1.34  
14     2  0.0872
15     6 -0.391 
16     8 -0.250 

I need to fill NAs in v1 using the mutate. The idea is that when there is one NA in v1, it will be filled by the multiplication between the variable alt and the value of v1 prior to the NA.

I solve this using loop for, but this may take time to depend on the data set.

for (i in 1:length(v1)) {
  if( is.na(tb[i, 'v1']) ){
    tb[i, 'v1'] = tb[i-1, 'v1']*tb[i, 'alt']
  }
}

This yields:

A tibble: 16 × 2
       v1     alt
    <dbl>   <dbl>
 1  1      0.495 
 2  0.726  0.726 
 3  3      0.667 
 4  5      0.955 
 5 -8.38  -1.68  
 6 10.1   -1.21  
 7  2     -1.96  
 8 12      1.47  
 9  4.47   0.372 
10  5      1.07  
11  2.65   0.531 
12  0      0.102 
13  1      1.34  
14  2      0.0872
15  6     -0.391 
16  8     -0.250 

My question is: How do I fill NAs using my condition and mutate or others dplyr verbs ?

CodePudding user response:

Something closer to what you are looking for is:

tb %>%
   mutate(v1 = unlist(accumulate2(v1, alt[-1], ~if(is.na(..2))..3*..1 else ..2)))

# A tibble: 16 x 2
       v1     alt
    <dbl>   <dbl>
 1  1      0.495 
 2  0.726  0.726 
 3  3      0.667 
 4  5      0.955 
 5 -8.4   -1.68  
 6 10.2   -1.21  
 7  2     -1.96  
 8 12      1.47  
 9  4.46   0.372 
10  5      1.07  
11  2.66   0.531 
12  0      0.102 
13  1      1.34  
14  2      0.0872
15  6     -0.391 
16  8     -0.25  

CodePudding user response:

You could make use of cumprod:

tb %>%
  group_by(id = cumsum(!is.na(v1))) %>%
  mutate(v1 = v1[1]*cumprod(alt^is.na(v1)))

# A tibble: 16 x 3
# Groups:   id [11]
       v1     alt    id
    <dbl>   <dbl> <int>
 1  1      0.495      1
 2  0.726  0.726      1
 3  3      0.667      2
 4  5      0.955      3
 5 -8.4   -1.68       3
 6 10.2   -1.21       3
 7  2     -1.96       4
 8 12      1.47       5
 9  4.46   0.372      5
10  5      1.07       6
11  2.66   0.531      6
12  0      0.102      7
13  1      1.34       8
14  2      0.0872     9
15  6     -0.391     10
16  8     -0.25      11

In Base R:

prd <- function(a, b){
  ifelse(is.na(a), prd(b * c(1,head(a,-1)),b), a)
}


transform(tb, v1 = prd(v1, alt))
       v1     alt
1   1.000  0.4950
2   0.726  0.7260
3   3.000  0.6670
4   5.000  0.9550
5  -8.400 -1.6800
6  10.164 -1.2100
7   2.000 -1.9600
8  12.000  1.4700
9   4.464  0.3720
10  5.000  1.0700
11  2.655  0.5310
12  0.000  0.1020
13  1.000  1.3400
14  2.000  0.0872
15  6.000 -0.3910
16  8.000 -0.2500

CodePudding user response:

tb %>%
  mutate(
    v1 = coalesce(v1, lag(v1) * alt)
  )
# # A tibble: 16 × 2
#         v1     alt
#      <dbl>   <dbl>
#  1   1      0.223 
#  2   0.377  0.377 
#  3   3      0.469 
#  4   5     -0.320 
#  5   0.311  0.0623
#  6  NA     -0.460 
#  7   2      0.737 
#  8  12     -0.946 
#  9 -31.6   -2.63  
# 10   5     -0.632 
# 11  -0.134 -0.0268
# 12   0      0.329 
# 13   1     -2.01  
# 14   2      1.41  
# 15   6     -1.22  
# 16   8     -0.125 

Note that I have taken you literally with

when there is one NA in v1, it will be filled by the multiplication between the variable alt and the value of v1 prior to the NA.

As you can see above, when there are 2 NAs (rows 5 and 6), only the first one is filled.

  • Related