I have a data frame which I would like to group according to the value in a given row and column of the data frame
my_data <- data.frame(matrix(ncol = 3, nrow = 4))
colnames(my_data) <- c('Position', 'Group', 'Data')
my_data[,1] <- c('A1','B1','C1','D1')
my_data[,3] <- c(1,2,3,4)
grps <- list(c('A1','B1'),
c('C1','D1'))
grp.names = c("Control", "Exp1", "EMPTY")
my_data$Group <- case_when(
my_data$Position %in% grps[[1]] ~ grp.names[1],
my_data$Position %in% grps[[2]] ~ grp.names[2]
)
OR
my_data$Group <- with(my_data, ifelse(Position %in% grps[[1]], grp.names[1],
ifelse(Position %in% grps[[2]], grp.names[2],
grp.names[3])))
These examples work and produce a Group column with appropriate labels, however I need to have flexibility in the length of the grps
list from 1 to approximately 25.
I see no way to iterate through case_with
or ifelse
in a for
loop eg.
my_data$Group <- for (i in 1:length(grps)){
case_when(
my_data$Well %in% grps[[i]] ~ grp.names[i])
}
This example simply deletes the Group column
What is the most appropriate way to handle a variable grps
length?
CodePudding user response:
I believe your question implies that the grps
variable is a list and every element in that list is itself an array that holds all the positions that belong to that group.
Specifically, in your grps
variable below, if the Position is "A1" or "B1" it belongs to the whatever your first entry is grp.names
. Similarly, if the position is "C1" or "D1" it belongs to whatever your second entry is in grp.names
> grps
[[1]]
[1] "A1" "B1"
[[2]]
[1] "C1" "D1"
Assuming that to be the case you can do the following:
matching_group_df <- sapply(grps, function(x){ my_data$Position %in% x})
selected_group <- apply(matching_group_df, 1, function(x){which(x == TRUE)})
my_data$Group <- grp.names[selected_group]
Position Group Data
1 A1 Control 1
2 B1 Control 2
3 C1 Exp1 3
4 D1 Exp1 4
The way it works is as follows:
matching_group_df
is a matrix of True/False (created via thesapply
function) that specifies what group index the position belongs to:
> matching_group_df
[,1] [,2]
[1,] TRUE FALSE
[2,] TRUE FALSE
[3,] FALSE TRUE
[4,] FALSE TRUE
- You then select the column that has the
TRUE
value row by row using an apply command:
selected_group <- apply(matching_group_df, 1, function(x){which(x == TRUE)})
> selected_group
[1] 1 1 2 2
- Finally you pass those indices to your grp.names list to select the appropriate ones and set them into your original dataframe.
grp.names[selected_group]
[1] "Control" "Control" "Exp1" "Exp1"
This also has the small side benefit of just using base R functions if that is important to you.
CodePudding user response:
Approach 1: Hash table
I would opt for a different approach here, as group makeup might change during analysis, specifically a lookup table of key-value pairs, and write a small accessor function.
library(tidyverse)
# First, a small adjustment to `grps` to reflect an empty group.
grps <- list(c('A1','B1'),
c('C1','D1'),
NULL)
names <- unlist(grps, use.names = F)
values <- rep(grp.names, map_dbl(grps, length))
h = as.list(values) %>%
set_names(names) %>%
list2env()
# find x in h
f <- Vectorize(function(x) h[[x]], c("x")) # scoping here
This takes some time to setup, but usage is quite convenient:
my_data %>%
mutate(Groups = f(Position))
Position Group Data
1 A1 Control 1
2 B1 Control 2
3 C1 Exp1 3
4 D1 Exp1 4
This avoids having to change your code in multiple places, and can take on arbitrary length of groups.
Approach 2: Dynamic switch
Alternatively, we can make an arbitrary length switch
expression, building it from the group names and their unique values.
constructor <- function(ids, names){
purrr::imap_chr(as.character(ids), ~paste(paste0("\"", .x ,"\""),
paste0("\"", names[.y], "\""),
sep = "=")) %>%
paste0(collapse = ", ") %>%
paste0("Vectorize(function(x) switch(as.character(x), ", ., ", NA))", collapse = "") %>%
str2expression()
}
my_data %>%
mutate(Group = eval(constructor(names, values)))
In this case, it would evaluate the expression
expression(Vectorize(function(x) switch(as.character(x), A1 = "Control",
B1 = "Control", C1 = "Exp1", D1 = "Exp1",
NA)))
CodePudding user response:
For each item in my_data$Position you want to go through each of the grps and look for a match and assign grp.names, if so. If you don't find a match in any grp, assign grp.names[3]:
my_data$Group <- lapply(my_data$Position, function(position){ # Goes through each my_data$Position
for(i in 1:length(grps)){
if(position %in% grps[[i]]){
return(grp.names[i]) # Give matching index of grp.names to grps
} else if (i == length(grps)){ # if no matches assign grp.names[3]
return(grp.names[3])
}
}
}) %>% unlist() # Put the list into a vector