How to permute a function in nested list in R?-CodePudding

I'm trying to calculate the gower::gower_dist() index of a subset of a list nested with the following subset.

i.e., I have a nested list with ten lines in each subset.

I would like to:

Calculate the gower::gower_dist() index for the first set of 10 rows with the next set, then the first with the third, and so on.
Calculate an average value of each iteration
Order from highest to lowest to identify the comparison set that had the highest mean value

A reproducible example:

list_to_split <- data.frame(rnorm(100), rnorm(100), rnorm(100))
names(list_to_split) <- c("var_1", "var_2", "var_3")

n <- 10
nr <- nrow(list_to_split)

nested_list<-split(list_to_split[,c(1:3)], rep(1:ceiling(nr/n), each=n, length.out=nr))

Below is a piece of the calculus I'm intending to do:

dt_1 <- list_to_split[c(1:10),]
dt_2 <- list_to_split[c(11:20),]

gower_test <- gower::gower_dist(dt_1,  dt_2)

mean(gower_test[[1]])

> gower::gower_dist(dt_1,  dt_2)
 [1] 0.45988316 0.04906887 0.31952329 0.54794324 0.23139261 0.26743197 0.27649944 0.35229745 0.19163644 0.20118909

> mean(gower_test[[1]])
[1] 0.4598832

The above example is for only the first one with the second one. I would like to perform for the entire list and test all combinations

CodePudding user response：

library(tidyr)
library(purrr)
library(dplyr)

# create a nested data frame where we have created 10 lists
# for rows 1..10, 11..20, etc
df <- list_to_split %>% 
    mutate(row_id = (row_number()-1) %/% 10   1) %>% 
    group_by(row_id) %>% 
    nest()

# create cartesian product
crossing(a = df, b = df) %>% 
    
    # compute gdist for each combo
    mutate(gdist = map2(a$data, b$data, gower::gower_dist)) %>% 
    
    # compute avg value for each
    mutate(gavg = map_dbl(gdist, mean)) %>% 
    
    # order
    arrange(-gavg)