Let's say I have the following matrix,
mat.data <- c(1, NA, NA, NA, 1, NA, 1, 2, 3)
mat <- matrix(mat.data,nrow=3,ncol=3,byrow=TRUE)
mat
[,1] [,2] [,3]
[1,] 1 NA NA
[2,] NA 1 NA
[3,] 1 2 3
Where the columns are sequential measurements of 3 different individuals (the rows), in this case, NA could represent either the person being dead OR a missing value. It can be assumed that if an NA value is only followed by NA values then the person is dead, otherwise, the value is missing.
As such, I am looking to create a for loop with if statements to change the NA values to be 100 if it is missing or 99 if the person is dead. Such that we would end up with the following matrix.
mat1.data <- c(1, 99, 99, 100, 1, 99, 1, 2, 3)
mat1 <- matrix(mat1.data,nrow=3,ncol=3,byrow=TRUE)
mat1
[,1] [,2] [,3]
[1,] 1 99 99
[2,] 100 1 99
[3,] 1 2 3
I am having an issue categorising the missing values. I am looking for it to equal 100 if the value mat[r,c] is NA and after it in the row are other non-NA values. This is the code I was starting with but unsure what to do for the part after the &&.
mat1 <- matrix()
for (x in 1:nrow(mat)) {
for (y in 1:ncol(mat)) {
if (is.na(mat[x,y]) && (!is.na(mat[x,y c(0:(ncol(mat)-y))]))){
mat1[x,y] = 100
}
else if(is.na(mat[x,y])){
mat1[x,y] = 99
}
else
mat1[x,y] = mat[x,y]
}
}
CodePudding user response:
We could mess around with the cumsum
s of the abs
olute is.na
-diff
erences after rev
ersing a row.
f <- \(x) {
dead <- rev(cumsum(abs(diff(c(FALSE, rev(is.na(x))))))) == 1
x[dead] <- 99
x[is.na(x) & !dead] <- 100
x
}
t(apply(mat, 1, f))
# [,1] [,2] [,3]
# [1,] 1 99 99
# [2,] 100 1 99
# [3,] 1 2 3
Or if you prefer the for
loop:
for (i in seq_len(nrow(mat))) {
dead <- rev(cumsum(abs(diff(c(FALSE, rev(is.na(mat[i, ]))))))) == 1
mat[i, dead] <- 99
mat[i, is.na(mat[i, ]) & !dead] <- 100
}
mat
# [,1] [,2] [,3]
# [1,] 1 99 99
# [2,] 100 1 99
# [3,] 1 2 3
CodePudding user response:
This ended up not being as nice as I would have liked, but this should work
na.ends <- lapply(apply(is.na(mat), 1, rle), function(x) {
last <- length(x$values)
if (x$values[last]==1) {
sum(x$lengths) - x$lengths[last]:1 1
} else {
numeric(0)
}
})
dead.pos <- do.call("rbind", Map(function(x, y) if (length(y)>0) cbind(x,y) else NULL, seq_along(na.ends), na.ends))
mat[dead.pos] <- 99
mat[is.na(mat)] <- 100
The idea is that we use rle
to calculate runs of NA values and then find all the ones specifically at the end of the rows. We then grab the indexes of those values and assign them all to 100 in one go using matrix indexing. Then any remaining NA values in the matrix should be non-99 values so we can just fill the remaining with 100.