Home > front end >  How to permute a function in nested list in R?
How to permute a function in nested list in R?

Time:03-29

I'm trying to calculate the gower::gower_dist() index of a subset of a list nested with the following subset.

i.e., I have a nested list with ten lines in each subset.

I would like to:

  • Calculate the gower::gower_dist() index for the first set of 10 rows with the next set, then the first with the third, and so on.

  • Calculate an average value of each iteration

  • Order from highest to lowest to identify the comparison set that had the highest mean value

A reproducible example:

list_to_split <- data.frame(rnorm(100), rnorm(100), rnorm(100))
names(list_to_split) <- c("var_1", "var_2", "var_3")

n <- 10
nr <- nrow(list_to_split)

nested_list<-split(list_to_split[,c(1:3)], rep(1:ceiling(nr/n), each=n, length.out=nr))

Below is a piece of the calculus I'm intending to do:

dt_1 <- list_to_split[c(1:10),]
dt_2 <- list_to_split[c(11:20),]

gower_test <- gower::gower_dist(dt_1,  dt_2)

mean(gower_test[[1]])

> gower::gower_dist(dt_1,  dt_2)
 [1] 0.45988316 0.04906887 0.31952329 0.54794324 0.23139261 0.26743197 0.27649944 0.35229745 0.19163644 0.20118909

> mean(gower_test[[1]])
[1] 0.4598832

The above example is for only the first one with the second one. I would like to perform for the entire list and test all combinations

CodePudding user response:

library(tidyr)
library(purrr)
library(dplyr)

# create a nested data frame where we have created 10 lists
# for rows 1..10, 11..20, etc
df <- list_to_split %>% 
    mutate(row_id = (row_number()-1) %/% 10   1) %>% 
    group_by(row_id) %>% 
    nest()

# create cartesian product
crossing(a = df, b = df) %>% 
    
    # compute gdist for each combo
    mutate(gdist = map2(a$data, b$data, gower::gower_dist)) %>% 
    
    # compute avg value for each
    mutate(gavg = map_dbl(gdist, mean)) %>% 
    
    # order
    arrange(-gavg)
  • Related