I have a list called "data". It consists of 10 elements (lists), each having different number of elements (lists), such as
lengths(data)
[1] 26 33 3 20 22 21 17 18 12 29
Thus, the first element of our list consists of 26 elements, the second of 33, and so on... Each of these elements are dataframes ("tibbles"), with 6 columns (first four being integers, fifth logical, and the last character), for instance
colnames(data[[1]][[1]])
[1] "width" "height" "x" "y" "space" "text"
Although the structure of dataframes (columns)is consistent in and outside of the groups, the number of rows differs for each dataframe even within the group.
I want to find a mode "height" for the dataframes grouped within the same element.
Thus, there is common mode for 26 dataframes within the first element and so on. In other words, I want to group the data for 26 dataframes within the first element, calculate the mode, and then write result as a new column to each of the dataframes so that I could perform different operations for rows with height above, below, and equal to mode.
This is what I figured out so far, although it is not correct it should produce the same result in most of the cases:
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
mode <- lapply(data, function(x) lapply(lapply(x, '[[', 'height'), getmode)) #find mode height for each paper and each page
mode2 <- lapply(mode, function (x) getmode(x)) # find mode for each paper
CodePudding user response:
Here is one option where we loop over the outer list with map
, then bind the inner list elements to a single data with bind_rows
(from dplyr
) creating a column grp
, apply the getmode
on the combined 'height' column to create a new column and then split the dataset by the 'grp' column
library(purrr)
library(dplyr)
map(data, ~ bind_rows(.x, .id = 'grp') %>%
mutate(Mode = getmode(height)) %>%
group_split(grp, .keep = FALSE))
Or loop over the list
with lapply
, loop over the inner list with sapply
, extract the 'height' from individual inner list elements, apply the getmode
and return a vector of mode values on which the getmode
is applied again. Then loop over the inner list and create a new column with the mode value we got
lapply(data, function(x) {
mode <- getmode(sapply(x, function(y) getmode(y$height)))
lapply(x, function(y) cbind(y, mode))
})
data
set.seed(24)
data1 <- replicate(26, head(mtcars) %>% mutate(height = rnorm(6)), simplify = FALSE)
data2 <- replicate(33, head(iris) %>% mutate(height = rnorm(6)), simplify = FALSE)
data <- list(data1, data2)