Home > OS >  R Manipulating List of Lists With Conditions / Joining Data
R Manipulating List of Lists With Conditions / Joining Data

Time:02-26

I have the following data showing 5 possible kids to invite to a party and what neighborhoods they live in.

I have a list of solutions as well (binary indicators of whether the kid is invited or not; e.g., the first solution invites Kelly, Gina, and Patty.

data <- data.frame(c("Kelly", "Andrew", "Josh", "Gina", "Patty"), c(1, 1, 0, 1, 0), c(0, 1, 1, 1, 0))
names(data) <- c("Kid", "Neighborhood A", "Neighborhood B")
solutions <- list(c(1, 0, 0, 1, 1), c(0, 0, 0, 1, 1), c(0, 1, 0, 1, 1), c(1, 0, 1, 0, 1), c(0, 1, 0, 0, 1))

I'm looking for a way to now filter the solutions in the following ways:

a) Only keep solutions where there are at least 3 kids from both neighborhood A and neighborhood B (one kid can count as one for both if they're part of both)

b) Only keep solutions that have at least 3 kids selected (i.e., sum >= 3)

I think I need to somehow join data to the solutions in solutions, but I'm a bit lost on how to manipulate everything since the solutions are stuck in lists. Basically looking for a way to add entries to every solution in the list indicating a) how many kids the solution has, b) how many kids from neighborhood A, and c) how many kids from neighborhood B. From there I'd have to somehow filter the lists to only keep the solutions that satisfy >= 3?

Thank you in advance!

CodePudding user response:

I wrote a little function to check each solution and return TRUE or FALSE based on your requirements. Passing your solutions to this using sapply() will give you a logical vector, with which you can subset solutions to retain only those that met the requirements.

check_solution <- function(solution, data) {
  data <- data[as.logical(solution),]
  sum(data[["Neighborhood A"]]) >= 3 && sum(data[["Neighborhood B"]]) >= 3
}
### No need for function to test whether `sum(solution) >= 3`, since 
### this will *always* be true if either neighborhood sums is >= 3.

tests <- sapply(solutions, check_solution, data = data)
# FALSE FALSE FALSE FALSE FALSE

solutions[tests]
# list()

### none of the `solutions` provided actually meet criteria

Edit: OP asked in the comments how to test against all neighborhoods in the data, and return TRUE if a specified number of neighborhoods have enough kids. Below is a solution using dplyr.

library(dplyr)

data <- data.frame(
  c("Kelly", "Andrew", "Josh", "Gina", "Patty"), 
  c(1, 1, 0, 1, 0), 
  c(0, 1, 1, 1, 0),
  c(1, 1, 1, 0, 1),
  c(0, 1, 1, 1, 1)
)
names(data) <- c("Kid", "Neighborhood A", "Neighborhood B", "Neighborhood C", 
                 "Neighborhood D")
solutions <- list(c(1, 0, 0, 1, 1), c(0, 0, 0, 1, 1), c(0, 1, 0, 1, 1), 
                  c(1, 0, 1, 0, 1), c(0, 1, 0, 0, 1))

check_solution <- function(solution, 
                           data, 
                           min_kids = 3, 
                           min_neighborhoods = NULL) {
  neighborhood_tests <- data %>% 
    filter(as.logical(solution)) %>% 
    summarize(across(starts_with("Neighborhood"), ~ sum(.x) >= min_kids)) %>% 
    as.logical()
  # require all neighborhoods by default
  if (is.null(min_neighborhoods)) min_neighborhoods <- length(neighborhood_tests)
  sum(neighborhood_tests) >= min_neighborhoods
}

tests1 <- sapply(solutions, check_solution, data = data)
solutions[tests1]
# list()

tests2 <- sapply(
  solutions, 
  check_solution, 
  data = data, 
  min_kids = 2, 
  min_neighborhoods = 3
)
solutions[tests2]
# [[1]]
# [1] 1 0 0 1 1
# 
# [[2]]
# [1] 0 1 0 1 1
  • Related