Context: I'm working with survey data organized as a 4D array with this structure: m[n_sites, n_surveys, n_years, n_species]
.
Question: There are data missing randomly, though, and I want to move the missing data to the end of each row.
Example: Here's the original data:
, , 1, 1
1 2 3 4 5
1 NA 2 NA 2 3
2 NA 3 1 NA NA
3 4 NA NA 4 6
4 2 NA NA 2 1
... and I want to rearrange this to be:
, , 1, 1
1 2 3 4 5
1 2 2 3 NA NA
2 3 1 NA NA NA
3 4 4 6 NA NA
4 2 2 1 NA NA
Note: The data are very large, though, so I need something efficient and fairly simple.
Reproducible code:
library(magrittr) ## for %>% pipe
library(reshape2) ## for acast
set.seed(1)
# Simulate survey data
df <- expand.grid(
species = c(1,2),
year = c(1,2,3),
site = c(1,2,3,4),
survey = c(1,2,3,4,5))
df$counts <- rpois(n = nrow(df), lambda = 3)
# Add random NAs (missing data)
posNA <- sample(x = 1:nrow(df), size = 0.5 * nrow(df), replace = FALSE)
df$counts[posNA] <- NA
# Cast to 4d array
m <- df %>% acast(site ~ survey ~ year ~ species)
CodePudding user response:
We could use
m1 <- m
for(i in seq_len(dim(m)[3]))
for(j in seq_len(dim(m)[4]))
m1[,, i, j] <- t(apply(m1[,, i, j], 1,
function(x) x[order(is.na(x))]))
-output
> m1[,, 1, 1]
1 2 3 4 5
1 2 2 3 NA NA
2 3 1 NA NA NA
3 4 4 6 NA NA
4 2 2 1 NA NA
CodePudding user response:
You may use na.omit
on rows and the 3rd and 4th dimension, correct the length
by cumber of columns and transpose result using aperm
.
## 4D-array
apply(m, c(1, 3, 4), \(x) `length<-`(na.omit(x), dim(m)[2])) |> aperm(c(2, 1, 3, 4))
# , , 1, 1
#
# [,1] [,2] [,3] [,4] [,5]
# 1 2 2 3 NA NA
# 2 3 1 NA NA NA
# 3 4 4 6 NA NA
# 4 2 2 1 NA NA
#
# , , 2, 1
#
# [,1] [,2] [,3] [,4] [,5]
# 1 3 0 3 NA NA
# 2 3 3 2 3 NA
# 3 3 4 NA NA NA
# 4 6 3 1 NA NA
#
# , , 3, 1
#
# [,1] [,2] [,3] [,4] [,5]
# 1 2 NA NA NA NA
# 2 2 1 NA NA NA
# 3 4 2 NA NA NA
# 4 4 0 4 3 NA
#
# , , 1, 2
#
# [,1] [,2] [,3] [,4] [,5]
# 1 2 2 NA NA NA
# 2 4 3 8 NA NA
# 3 2 1 2 3 NA
# 4 4 NA NA NA NA
#
# , , 2, 2
#
# [,1] [,2] [,3] [,4] [,5]
# 1 5 5 NA NA NA
# 2 4 NA NA NA NA
# 3 2 1 NA NA NA
# 4 5 NA NA NA NA
#
# , , 3, 2
#
# [,1] [,2] [,3] [,4] [,5]
# 1 5 2 2 2 NA
# 2 1 4 2 3 NA
# 3 8 2 3 NA NA
# 4 5 NA NA NA NA
Here additionally at a simpler array to demonstrate the logic:
## 3D-array
a
# , , 1
#
# [,1] [,2] [,3] [,4]
# [1,] NA NA 1 3
# [2,] 4 1 1 NA
# [3,] NA 3 NA 4
#
# , , 2
#
# [,1] [,2] [,3] [,4]
# [1,] 3 2 2 4
# [2,] 1 NA 3 4
# [3,] 1 NA 4 3
apply(a, c(1, 3), \(x) `length<-`(na.omit(x), dim(a)[2])) |> aperm(c(2, 1, 3))
# , , 1
#
# [,1] [,2] [,3] [,4]
# [1,] 1 3 NA NA
# [2,] 4 1 1 NA
# [3,] 3 4 NA NA
#
# , , 2
#
# [,1] [,2] [,3] [,4]
# [1,] 3 2 2 4
# [2,] 1 3 4 NA
# [3,] 1 4 3 NA
Data:
m <- structure(c(NA, NA, 4L, 2L, 2L, 3L, NA, NA, NA, 1L, NA, NA, 2L,
NA, 4L, 2L, 3L, NA, 6L, 1L, 3L, 3L, NA, 6L, 0L, 3L, NA, 3L, NA,
2L, 3L, 1L, 3L, 3L, 4L, NA, NA, NA, NA, NA, 2L, 2L, 4L, 4L, NA,
NA, NA, 0L, NA, NA, NA, NA, NA, NA, NA, 4L, NA, 1L, 2L, 3L, NA,
4L, 2L, NA, NA, 3L, 1L, NA, NA, NA, 2L, 4L, 2L, NA, NA, NA, 2L,
8L, 3L, NA, 5L, NA, NA, NA, NA, NA, NA, NA, 5L, NA, 2L, NA, NA,
4L, 1L, 5L, NA, NA, NA, NA, 5L, 1L, 8L, NA, 2L, 4L, NA, NA, 2L,
NA, 2L, 5L, NA, 2L, NA, NA, 2L, 3L, 3L, NA), dim = c(4L, 5L,
3L, 2L), dimnames = list(c("1", "2", "3", "4"), c("1", "2", "3",
"4", "5"), c("1", "2", "3"), c("1", "2")))
a <- structure(c(NA, 4L, NA, NA, 1L, 3L, 1L, 1L, NA, 3L, NA, 4L, 3L,
1L, 1L, 2L, NA, NA, 2L, 3L, 4L, 4L, 4L, 3L), dim = c(3L, 4L,
2L))