Say I have the following list where each element is a data.frame of different sizes
df1 <- data.frame(matrix(rnorm(12346), ncol = 2))
df2 <- data.frame(matrix(rnorm(14330), ncol = 2))
df3 <- data.frame(matrix(rnorm(2422), ncol = 2))
l <- list(df1, df2, df3)
In my example each data.frame represents a year of observations, and clearly df3
contains a lot fewer observations compared to the other two.
My question is then:
What is the best approach to detect those elements of the list l
that does not compare in the number of rows and then remove them from the list?
I've so far tried using the median but as this should always remove half of the elements in l
I'm not sure this is the best solution for future use
library(collapse)
cutoff <- input %>%
vapply(nrow, FUN.VALUE = length(.) %>%
median()
idx <- dapply(X = input, FUN = function(x) nrow(x) >= cutoff)
input[idx]
where input
is a list as the above l
NOTE: As this is my first question on SO, please feel free to edit the question if it does not live up the standards of this community or give feedback on asking better questions. Thanks in advance
EDIT:
The question is not so much on how to use median to remove elements of the list, but rather IF median is the right method to remove those data.frames
which have a lot less observations than the others
CodePudding user response:
Does this work:
l[sapply(l, function(x) nrow(x) >= median(unlist(lapply(l, nrow))))]
CodePudding user response:
purrr::keep
is the way to go when filtering lists with conditions.
library(purrr)
keep(l, ~ nrow(.x) > median(map_dbl(l, nrow)))