Data
Here is the data for my example:
#### Create Data ####
df <- data.frame(X1 = c(NA,1,1,1,0),
X2 = c(1,1,1,0,0),
X3 = c(1,1,NA,0,0),
X4 = c(1,1,1,1,NA),
X5 = c(1,1,1,0,NA),
X6 = c(1,NA,1,1,NA)) %>%
as_tibble()
Problem
When you print the data, it looks like this:
# A tibble: 5 × 6
X1 X2 X3 X4 X5 X6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 1 1 1 1 1
2 1 1 1 1 1 NA
3 1 1 NA 1 1 1
4 1 0 0 1 0 1
5 0 0 0 NA NA NA
Basically there are cases where there is sporadic and random missingness in this data (rows 1-4). However, those with three zeroes in a row are those that have been converted to NA values after a stopping rule for multiple "wrong" answers (row 5). Theoretically I could just blindly remove these with the following code:
df %>%
mutate(across(everything(),
~ replace(.,
is.na(.),
0)))
And the NA's would be removed:
# A tibble: 5 × 6
X1 X2 X3 X4 X5 X6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 1 1 1 1 1
2 1 1 1 1 1 0
3 1 1 0 1 1 1
4 1 0 0 1 0 1
5 0 0 0 0 0 0
However, it appears that this does not faithfully attack the problem. The NAs that are random are actually missing whereas the values that have been made NA are not. So I need a way to conditionally filter these values out for all cases where three 0s are recorded in a row, however I'm struggling with figuring out how to do this.
CodePudding user response:
Using is.na
we could paste0
the rows to strings and check if number of matches with 111
are greater than zero using stringi::stri_count
to create a flag. After that, replace
NA
s with zeros if a flag is present.
num_NA <- 3
flag <- apply( (is.na(df)), 1, paste0, collapse='') |>
stringi::stri_count(regex=paste(rep(1, num_NA), collapse='')) |> base::`>`(0)
df[flag, ] <- lapply(df[flag, ], \(x) replace(x, is.na(x), 0))
df
# X1 X2 X3 X4 X5 X6
# 1 NA 1 1 1 1 1
# 2 1 1 1 1 1 NA
# 3 1 1 NA 1 1 1
# 4 1 0 0 1 0 1
# 5 0 0 0 0 0 0
Data:
df <- structure(list(X1 = c(NA, 1, 1, 1, 0), X2 = c(1, 1, 1, 0, 0),
X3 = c(1, 1, NA, 0, 0), X4 = c(1, 1, 1, 1, NA), X5 = c(1,
1, 1, 0, NA), X6 = c(1, NA, 1, 1, NA)), class = "data.frame", row.names = c(NA,
-5L))
CodePudding user response:
This is sort of a non-answer, but too big for a comment. Doubling df
:
df2 <- rbind(df, df)
> df2
X1 X2 X3 X4 X5 X6
1 NA 1 1 1 1 1
2 1 1 1 1 1 NA
3 1 1 NA 1 1 1
4 1 0 0 1 0 1
5 0 0 0 NA NA NA
6 NA 1 1 1 1 1
7 1 1 1 1 1 NA
8 1 1 NA 1 1 1
9 1 0 0 1 0 1
10 0 0 0 NA NA NA
# fiddle with it
df2[3,] <- c(0,NA,0,NA,0,NA)
suspects <- which(rowSums(df2, na.rm = TRUE) == 0)
suspects
[1] 3 5 10
3 %in% rle(df2[suspects[3], ])$lengths
[1] TRUE
> 3 %in% rle(df2[suspects[1], ])$lengths
[1] FALSE
But, as this is related to 'faithfulness' in grading the consequences of a series, the above should just identify possible targets for rle
to nail the 3 zeros in a row.