Nested ifelse() or case_when() for unknown number of queries in R-CodePudding

I have a data frame which I would like to group according to the value in a given row and column of the data frame

my_data <- data.frame(matrix(ncol = 3, nrow = 4))
colnames(my_data) <- c('Position', 'Group', 'Data')
                      
my_data[,1] <- c('A1','B1','C1','D1')
my_data[,3] <- c(1,2,3,4)

grps <- list(c('A1','B1'),
             
             c('C1','D1'))

grp.names = c("Control", "Exp1", "EMPTY")


my_data$Group <- case_when(
  my_data$Position %in% grps[[1]] ~ grp.names[1],
  my_data$Position %in% grps[[2]] ~ grp.names[2]
)

my_data$Group <- with(my_data, ifelse(Position %in% grps[[1]], grp.names[1],
                                    ifelse(Position %in% grps[[2]], grp.names[2], 
                                    grp.names[3])))

These examples work and produce a Group column with appropriate labels, however I need to have flexibility in the length of the grps list from 1 to approximately 25.

I see no way to iterate through case_with or ifelse in a for loop eg.

my_data$Group <- for (i in 1:length(grps)){
  case_when(
    my_data$Well %in% grps[[i]] ~ grp.names[i])
}

This example simply deletes the Group column

What is the most appropriate way to handle a variable grps length?

CodePudding user response：

I believe your question implies that the grps variable is a list and every element in that list is itself an array that holds all the positions that belong to that group.

Specifically, in your grps variable below, if the Position is "A1" or "B1" it belongs to the whatever your first entry is grp.names. Similarly, if the position is "C1" or "D1" it belongs to whatever your second entry is in grp.names

> grps
[[1]]
[1] "A1" "B1"

[[2]]
[1] "C1" "D1"

Assuming that to be the case you can do the following:

matching_group_df <- sapply(grps, function(x){ my_data$Position %in% x})
selected_group <- apply(matching_group_df, 1, function(x){which(x == TRUE)})
my_data$Group <- grp.names[selected_group]

  Position   Group Data
1       A1 Control    1
2       B1 Control    2
3       C1    Exp1    3
4       D1    Exp1    4

The way it works is as follows:

matching_group_df is a matrix of True/False (created via the sapply function) that specifies what group index the position belongs to:

> matching_group_df
      [,1]  [,2]
[1,]  TRUE FALSE
[2,]  TRUE FALSE
[3,] FALSE  TRUE
[4,] FALSE  TRUE

You then select the column that has the TRUE value row by row using an apply command:

selected_group <- apply(matching_group_df, 1, function(x){which(x == TRUE)})

> selected_group
[1] 1 1 2 2

Finally you pass those indices to your grp.names list to select the appropriate ones and set them into your original dataframe.

grp.names[selected_group]
[1] "Control" "Control" "Exp1"    "Exp1"

This also has the small side benefit of just using base R functions if that is important to you.

CodePudding user response：

Approach 1: Hash table

I would opt for a different approach here, as group makeup might change during analysis, specifically a lookup table of key-value pairs, and write a small accessor function.

library(tidyverse)

# First, a small adjustment to `grps` to reflect an empty group.
grps <- list(c('A1','B1'),
             c('C1','D1'),
             NULL)
names <- unlist(grps, use.names = F)
values <- rep(grp.names, map_dbl(grps, length))

h = as.list(values) %>%
  set_names(names) %>%
  list2env()

# find x in h
f <- Vectorize(function(x) h[[x]], c("x")) # scoping here

This takes some time to setup, but usage is quite convenient:

my_data %>%
  mutate(Groups = f(Position))

  Position   Group Data
1       A1 Control    1
2       B1 Control    2
3       C1    Exp1    3
4       D1    Exp1    4

This avoids having to change your code in multiple places, and can take on arbitrary length of groups.

Approach 2: Dynamic switch

Alternatively, we can make an arbitrary length switch expression, building it from the group names and their unique values.

constructor <- function(ids, names){
  purrr::imap_chr(as.character(ids), ~paste(paste0("\"", .x ,"\""),
                                            paste0("\"", names[.y], "\""),
                                            sep = "=")) %>%
    paste0(collapse = ", ") %>%
    paste0("Vectorize(function(x) switch(as.character(x), ", ., ", NA))", collapse = "") %>%
    str2expression()
}

my_data %>%
  mutate(Group = eval(constructor(names, values)))

In this case, it would evaluate the expression

expression(Vectorize(function(x) switch(as.character(x), A1 = "Control", 
    B1 = "Control", C1 = "Exp1", D1 = "Exp1", 
    NA)))

CodePudding user response：

For each item in my_data$Position you want to go through each of the grps and look for a match and assign grp.names, if so. If you don't find a match in any grp, assign grp.names[3]:

my_data$Group <- lapply(my_data$Position, function(position){ # Goes through each my_data$Position
  for(i in 1:length(grps)){
    if(position %in% grps[[i]]){
      return(grp.names[i]) # Give matching index of grp.names to grps
    } else if (i == length(grps)){ # if no matches assign grp.names[3]
      return(grp.names[3])
    }
  }
}) %>% unlist() # Put the list into a vector