Sorry, I don't think my question embodied what I'm trying to ask well, but hopefully, my explanation will clarify.
Say I have a reference data frame object, "df", which looks something like this:
data.frame(v1 = rep(x, 5), direction = c(-3, 5, -2, 1, 4), cancer = c("can1", "can2", "can1", "can3", "can2"))
Which would appear as:
v1 direction cancer
1 x -3 can1
2 x 5 can2
3 x -2 can1
4 x 1 can3
5 x 4 can2
I am writing a function where I want to be able to subset direction for each cancer specified, which could be "up", "down", or "both". This works fine enough when only one cancer or one direction of interest is specified:
if (length(direction) == 1) { # meaning only one cancer or same direction set for all cancers specified
if (direction == "both") {
dir <- df # where I have already subsetted the df to include only cancers of interest
} else if (direction == "down") {
dir <- df[grep("-", df$direction)]
} else if (direction == "up") {
dir <- df[grep("-", df$direction, invert = TRUE)]
}
my problem arises where if I needed to specify different directions for different cancers, e.g., I subsetted to can1 and can2, but want the up direction for can1 and the down direction for can2. I'd imagine the function input for this would be a vector of cancers and a vector of directions of the same length as arguments for my function, e.g., cancer = c("can1", "can2"), direction = c("up", "down") and I would subset to only the up direction for can1 and only the down direction for can2. How do you recommend I go about this?
I can see writing a for loop in which I specify
for (i in length(direction)) {}
but I'm not sure how to get it to correspond to the right cancer in the cancer argument vector and subset both correctly then store in one dataframe.
Thanks so much in advance, please let me know if you need any clarification!
CodePudding user response:
Here is a possible solution using data.table
(same can be achieved with subset
and rbind
in base R):
library(data.table)
DT <- data.table(v1 = "x", direction = c(-3, 5, -2, 1, 4),
cancer = c("can1", "can2", "can1", "can3", "can2"))
myfun <- function(dir, can){
if(length(dir) < length(can)) dir <- c(dir, rep(dir[length(dir)], length(can)-length(dir)))
direct <- setNames(c(-1, 1), c("down", "up"))
rbindlist(lapply(seq_along(can), function(x) {
ddir <- if(dir[x] == "both") direct else direct[dir[x]]
DT[cancer==can[x] & sign(direction) %in% ddir]
}))
}
myfun(c("down", "up"), c("can1", "can2"))
#> v1 direction cancer
#> 1: x -3 can1
#> 2: x -2 can1
#> 3: x 5 can2
#> 4: x 4 can2
myfun(c("up", "down"), c("can1", "can2"))
#> Empty data.table (0 rows and 3 cols): v1,direction,cancer
myfun("both", c("can1", "can3"))
#> v1 direction cancer
#> 1: x -3 can1
#> 2: x -2 can1
#> 3: x 1 can3
Created on 2022-03-07 by the reprex package (v2.0.1)
CodePudding user response:
This function allows you to specify "dir" as "up"/"down"/"both", and the cancer_type as "all" or "can1"/"can2"/"can3"
library(tidyverse)
filterdf <- function(dir = "both", cancer_type = "all") {
df %>%
mutate(dir = sapply(df$direction, function (x) ifelse(dir == "up", (x >= 0 | dir == "both"), (x < 0 | dir == "both")))) %>%
filter(
if (cancer_type == "all") {
dir == TRUE
} else {
cancer == cancer_type & dir == TRUE
}
) %>%
select(-dir)
}
> filterdf()
# A tibble: 5 x 3
v1 direction cancer
<chr> <dbl> <chr>
1 x -3 can1
2 x 5 can2
3 x -2 can1
4 x 1 can3
5 x 4 can2
> filterdf("both", "can3")
# A tibble: 1 x 3
v1 direction cancer
<chr> <dbl> <chr>
1 x 1 can3
> filterdf("up", "can2")
# A tibble: 2 x 3
v1 direction cancer
<chr> <dbl> <chr>
1 x 5 can2
2 x 4 can2