Home > front end >  I want to find a mode for each group of dataframes within the element of a list, and write the resul
I want to find a mode for each group of dataframes within the element of a list, and write the resul

Time:12-12

I have a list called "data". It consists of 10 elements (lists), each having different number of elements (lists), such as

lengths(data)

[1] 26 33 3 20 22 21 17 18 12 29

Thus, the first element of our list consists of 26 elements, the second of 33, and so on... Each of these elements are dataframes ("tibbles"), with 6 columns (first four being integers, fifth logical, and the last character), for instance

colnames(data[[1]][[1]])

[1] "width" "height" "x" "y" "space" "text"

Although the structure of dataframes (columns)is consistent in and outside of the groups, the number of rows differs for each dataframe even within the group.

I want to find a mode "height" for the dataframes grouped within the same element.

Thus, there is common mode for 26 dataframes within the first element and so on. In other words, I want to group the data for 26 dataframes within the first element, calculate the mode, and then write result as a new column to each of the dataframes so that I could perform different operations for rows with height above, below, and equal to mode.

This is what I figured out so far, although it is not correct it should produce the same result in most of the cases:

getmode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

mode <- lapply(data, function(x) lapply(lapply(x, '[[', 'height'), getmode)) #find    mode height for each paper and each page
mode2 <- lapply(mode, function (x) getmode(x)) # find mode for each paper 

CodePudding user response:

Here is one option where we loop over the outer list with map, then bind the inner list elements to a single data with bind_rows (from dplyr) creating a column grp, apply the getmode on the combined 'height' column to create a new column and then split the dataset by the 'grp' column

library(purrr)
library(dplyr)
map(data, ~ bind_rows(.x, .id = 'grp') %>% 
    mutate(Mode = getmode(height)) %>%
    group_split(grp, .keep = FALSE))

Or loop over the list with lapply, loop over the inner list with sapply, extract the 'height' from individual inner list elements, apply the getmode and return a vector of mode values on which the getmode is applied again. Then loop over the inner list and create a new column with the mode value we got

lapply(data, function(x) {
   mode <- getmode(sapply(x, function(y) getmode(y$height)))
   lapply(x, function(y) cbind(y, mode))
 })

data

set.seed(24)
 data1 <- replicate(26, head(mtcars) %>% mutate(height = rnorm(6)), simplify = FALSE)
 data2 <- replicate(33, head(iris) %>% mutate(height = rnorm(6)), simplify = FALSE)
 data <- list(data1, data2)
  •  Tags:  
  • r
  • Related