I have the following data frame:
library(tidyverse)
v1 = c(1 , NA, 3, 5, NA, NA, 2, 12, NA, 5, NA, 0, 1, 2, 6, 8)
alt = rnorm(length(v1), 0, 1)
tb = tibble(v1, alt)
print(tb)
A tibble: 16 × 2
v1 alt
<dbl> <dbl>
1 1 0.495
2 NA 0.726
3 3 0.667
4 5 0.955
5 NA -1.68
6 NA -1.21
7 2 -1.96
8 12 1.47
9 NA 0.372
10 5 1.07
11 NA 0.531
12 0 0.102
13 1 1.34
14 2 0.0872
15 6 -0.391
16 8 -0.250
I need to fill NAs
in v1
using the mutate. The idea is that when there is one NA
in v1
, it will be filled by the multiplication between the variable alt
and the value of v1
prior to the NA.
I solve this using loop for
, but this may take time to depend on the data set.
for (i in 1:length(v1)) {
if( is.na(tb[i, 'v1']) ){
tb[i, 'v1'] = tb[i-1, 'v1']*tb[i, 'alt']
}
}
This yields:
A tibble: 16 × 2
v1 alt
<dbl> <dbl>
1 1 0.495
2 0.726 0.726
3 3 0.667
4 5 0.955
5 -8.38 -1.68
6 10.1 -1.21
7 2 -1.96
8 12 1.47
9 4.47 0.372
10 5 1.07
11 2.65 0.531
12 0 0.102
13 1 1.34
14 2 0.0872
15 6 -0.391
16 8 -0.250
My question is: How do I fill NAs
using my condition and mutate
or others dplyr
verbs ?
CodePudding user response:
Something closer to what you are looking for is:
tb %>%
mutate(v1 = unlist(accumulate2(v1, alt[-1], ~if(is.na(..2))..3*..1 else ..2)))
# A tibble: 16 x 2
v1 alt
<dbl> <dbl>
1 1 0.495
2 0.726 0.726
3 3 0.667
4 5 0.955
5 -8.4 -1.68
6 10.2 -1.21
7 2 -1.96
8 12 1.47
9 4.46 0.372
10 5 1.07
11 2.66 0.531
12 0 0.102
13 1 1.34
14 2 0.0872
15 6 -0.391
16 8 -0.25
CodePudding user response:
You could make use of cumprod
:
tb %>%
group_by(id = cumsum(!is.na(v1))) %>%
mutate(v1 = v1[1]*cumprod(alt^is.na(v1)))
# A tibble: 16 x 3
# Groups: id [11]
v1 alt id
<dbl> <dbl> <int>
1 1 0.495 1
2 0.726 0.726 1
3 3 0.667 2
4 5 0.955 3
5 -8.4 -1.68 3
6 10.2 -1.21 3
7 2 -1.96 4
8 12 1.47 5
9 4.46 0.372 5
10 5 1.07 6
11 2.66 0.531 6
12 0 0.102 7
13 1 1.34 8
14 2 0.0872 9
15 6 -0.391 10
16 8 -0.25 11
In Base R:
prd <- function(a, b){
ifelse(is.na(a), prd(b * c(1,head(a,-1)),b), a)
}
transform(tb, v1 = prd(v1, alt))
v1 alt
1 1.000 0.4950
2 0.726 0.7260
3 3.000 0.6670
4 5.000 0.9550
5 -8.400 -1.6800
6 10.164 -1.2100
7 2.000 -1.9600
8 12.000 1.4700
9 4.464 0.3720
10 5.000 1.0700
11 2.655 0.5310
12 0.000 0.1020
13 1.000 1.3400
14 2.000 0.0872
15 6.000 -0.3910
16 8.000 -0.2500
CodePudding user response:
tb %>%
mutate(
v1 = coalesce(v1, lag(v1) * alt)
)
# # A tibble: 16 × 2
# v1 alt
# <dbl> <dbl>
# 1 1 0.223
# 2 0.377 0.377
# 3 3 0.469
# 4 5 -0.320
# 5 0.311 0.0623
# 6 NA -0.460
# 7 2 0.737
# 8 12 -0.946
# 9 -31.6 -2.63
# 10 5 -0.632
# 11 -0.134 -0.0268
# 12 0 0.329
# 13 1 -2.01
# 14 2 1.41
# 15 6 -1.22
# 16 8 -0.125
Note that I have taken you literally with
when there is one
NA
inv1
, it will be filled by the multiplication between the variablealt
and the value ofv1
prior to theNA
.
As you can see above, when there are 2 NAs (rows 5 and 6), only the first one is filled.