I have a dataset where I have a bunch of NAs -- the NAs occur in predictable patterns, representing a between-subjects manipulation. Example
Outcome New Variable Column NA NA NA NA NA 0 1 NA NA NA NA NA 1 2 NA NA NA NA NA
I want the New Variable Column to capture instances of the NA, NA, NA, NA, NA -- how do I tell R to search for a string of 5 NAs, then output a new name (lets call it 5X) for that string of 5 in a different column? Doesn't matter to me if the 5X term is only output once in the new column or for every string of 5 NAs.
CodePudding user response:
I think you might want the "run length encoding", see rle()
function. Here is an example, not sure if I completely follow the output that you want, but regardless the RLE should allow you to find runs of 5 NA (or any other number of NAs) in a row (or "run")
d <- data.frame(
variable = c(NA, NA, NA, NA, NA, 0, 1, NA, NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA)
)
x <- rle(is.na(d$variable))
x
#> Run Length Encoding
#> lengths: int [1:5] 5 2 5 2 5
#> values : logi [1:5] TRUE FALSE TRUE FALSE TRUE
d$new_column <- do.call('c', sapply(seq_along(x$values), function(i) {
if (x$values[i] && x$lengths[i] == 5) {
rep("Infrequent", x$lengths[i])
} else rep("Frequent", x$lengths[i])
}))
d
#> variable new_column
#> 1 NA Infrequent
#> 2 NA Infrequent
#> 3 NA Infrequent
#> 4 NA Infrequent
#> 5 NA Infrequent
#> 6 0 Frequent
#> 7 1 Frequent
#> 8 NA Infrequent
#> 9 NA Infrequent
#> 10 NA Infrequent
#> 11 NA Infrequent
#> 12 NA Infrequent
#> 13 1 Frequent
#> 14 2 Frequent
#> 15 NA Infrequent
#> 16 NA Infrequent
#> 17 NA Infrequent
#> 18 NA Infrequent
#> 19 NA Infrequent
CodePudding user response:
Here is an alternative approach using data.table::rleid
library(data.table)
setDT(d)[,
nc:=fifelse(.N>=5 & is.na(variable[1]),"infreq", "freq"),
rleid(variable)
]
Output:
variable nc
<num> <char>
1: NA infreq
2: NA infreq
3: NA infreq
4: NA infreq
5: NA infreq
6: 0 freq
7: 1 freq
8: NA infreq
9: NA infreq
10: NA infreq
11: NA infreq
12: NA infreq
13: 1 freq
14: 2 freq
15: NA infreq
16: NA infreq
17: NA infreq
18: NA infreq
19: NA infreq