I made a function that takes a dataframe as argument, and creates two dataframes in output according to a threshold value of one of the columns. These 2 output dataframes are named according to the original input dataframe.
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
assign(paste(nm,"_Overexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) > 1),], pos=1)
assign(paste(nm,"_Underexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) < -1),], pos=1)
}
The function works correctly. I would like to use a loop on this function so that each of my dataframes gives 2 dataframes according to my criteria, so I created a list that contains my dataframes:
listRes <- list(DJ21_T0, DJ24_T0, DJ29_T0, DJ32_T0,
DJ24_DJ21, DJ29_DJ21, DJ32_DJ21,
DJ21_DJ24, DJ29_DJ24, DJ32_DJ24,
DJ21_DJ29, DJ24_DJ29, DJ32_DJ29,
DJ21_DJ32, DJ24_DJ32, DJ29_DJ32,
Rec2_T0, Rec6_T0, Rec9_T0,
Rec2_DJ32, Rec6_DJ32, Rec9_DJ32,
Rec6_Rec2, Rec9_Rec2,
Rec2_Rec6, Rec9_Rec6,
Rec2_Rec9, Rec6_Rec9)
and used the following code:
for (i in 1:length(listRes)){
spliteOverUnder(listRes[[i]])
}
But this one returns me the objects listRes[[i]]_Overexpr
and listRec[[i]]Underexpr
I encounter the same problem when I do the loop like this:
for (i in listRes){
spliteOverUnder(i)
}
Which gives me the objects i_Overexpr
and i_Underexpr
.
lapply(listRes, spliteOverUnder)
doesn't work either...
How to loop correctly my function and get the objects corresponding to my dataframes ? (DJ21_T0_Overexpr
, DJ21_T0_Underexpr
, DJ24_T0_Overexpr
, DJ24_T0_Underexpr
, ... , Rec6_Rec9_Overexpr
, Rec6_Rec9_Underexpr
)
I think the trick deparse(substitute(res))
used in my function is problematic, giving the created objects the name i
or listRes[[i]]
rather than giving the name of the dataframe at position i
in my listRes
dataframe list.
Any help is welcome.
Thanks
CodePudding user response:
Here's a tidyverse solution that avoids the need to explicitly write loops, using map
instead. Note at the outset that you could probably do the whole thing using grouped or nested data frames, thus avoiding the need to create the objects. But if you do want to create the objects (or perhaps the starting dfs have different numbers of columns, even if they all have the log2FoldChange
column) then you could do something like the following.
First, some setup to make the example reproducible.
library(tidyverse)
set.seed(42722)
## Names of the example data frames we'll create
## are df_1 ... df5
df_names <- paste0("df_", 1:5) %>%
set_names()
## We'll make the new dfs by sampling from mtcars
base_df <- as_tibble(mtcars, rownames = "model") %>%
select(model, cyl, hp)
## Create 5 new data frame objects in our environment
df_names %>%
walk(~ assign(x = .x, # each element of df_names in turn
value = sample_n(base_df, 10),
envir = .GlobalEnv))
## Now we have, e.g.
df_1
#> # A tibble: 10 × 3
#> model cyl hp
#> <chr> <dbl> <dbl>
#> 1 Chrysler Imperial 8 230
#> 2 Mazda RX4 Wag 6 110
#> 3 Merc 450SE 8 180
#> 4 Porsche 914-2 4 91
#> 5 Toyota Corona 4 97
#> 6 Ford Pantera L 8 264
#> 7 Toyota Corolla 4 65
#> 8 Merc 280C 6 123
#> 9 Duster 360 8 245
#> 10 Merc 230 4 95
Next, get these five data frames and put them in a list, which is where the question starts from.
df_list <- map(df_names, get)
Now, working with this list of data frame, we can split each one into the over/under. If the split criteria were more complex we could write a function to do it. But here we use if_else
to create a new column in each data frame based on a threshold value of cyl
.
## - a. Create an over_under column in each df in the list,
## based on whether `cyl` in that particular df is < 5 or not
## - b. Split on this new column.
## - c. Put all the results into a new list called `split_list`
split_list <- df_list %>%
map(~ mutate(.,
over_under = if_else(.$cyl>5, "over", "under"))) %>%
map(~ split(., as.factor(.$over_under)))
Now we have a nested list. Each of df_1
to df_5
is split into an over or under table. We can look at them by e.g.
split_list$df_3$under
#> # A tibble: 6 × 4
#> model cyl hp over_under
#> <chr> <dbl> <dbl> <chr>
#> 1 Hornet 4 Drive 6 110 under
#> 2 Hornet Sportabout 8 175 under
#> 3 Maserati Bora 8 335 under
#> 4 Valiant 6 105 under
#> 5 Mazda RX4 Wag 6 110 under
#> 6 Cadillac Fleetwood 8 205 under
This is handy because we can use tab completion in our IDE to investigate the tables in the list.
We could just work with the list like this. Or we could bind them into a big df, by row, assuming they all have the same columns. But the OP wanted them as separate data frame objecs with a suffix _over
or _under
. So, e.g. to extract all the "over" dfs and make them objects with names df_1_over etc, we can do
split_list %>%
map("over") %>% # subset to "over" dfs only
set_names(nm = ~ paste0(.x, "_over")) %>% # name each list element
walk2(.x = names(.), # # write out each df with its name
.y = .,
.f = ~ assign(x = .x,
value = as_tibble(.y),
envir = .GlobalEnv))
Now in our environment we have e.g.
df_5_over
#> # A tibble: 3 × 4
#> model cyl hp over_under
#> <chr> <dbl> <dbl> <chr>
#> 1 Porsche 914-2 4 91 over
#> 2 Toyota Corona 4 97 over
#> 3 Toyota Corolla 4 65 over
We can get the "under" dfs as objects in the same way.
Again, depending on what was needed it might make more sense to do the whole thing from start to finish using a single tibble and grouping the data as needed. Or, if we know the original dfs all have the same columnar layout, bind them by row into a df indexed by their name, like this:
df_all <- bind_rows(df_list, .id = "id")
df_all
#> # A tibble: 50 × 4
#> id model cyl hp
#> <chr> <chr> <dbl> <dbl>
#> 1 df_1 Chrysler Imperial 8 230
#> 2 df_1 Mazda RX4 Wag 6 110
#> 3 df_1 Merc 450SE 8 180
#> 4 df_1 Porsche 914-2 4 91
#> 5 df_1 Toyota Corona 4 97
#> 6 df_1 Ford Pantera L 8 264
#> 7 df_1 Toyota Corolla 4 65
#> 8 df_1 Merc 280C 6 123
#> 9 df_1 Duster 360 8 245
#> 10 df_1 Merc 230 4 95
#> # … with 40 more rows
From there you can group the big df by id
make the over/under measures etc.
CodePudding user response:
Finally, the main problem was to make the distinction between the object and its name, and not to forget that creating a list of dataframes erases the name of these dataframes. The use of the names() function is therefore very useful. Be careful to name the objects in the same order as they are contained in the list.
create the list containing the dataframes
listRes <- list(DJ21_T0, DJ24_T0, DJ29_T0, DJ32_T0, DJ24_DJ21, DJ29_DJ21, DJ32_DJ21, DJ21_DJ24, DJ29_DJ24, DJ32_DJ24, DJ21_DJ29, DJ24_DJ29, DJ32_DJ29, DJ21_DJ32, DJ24_DJ32, DJ29_DJ32, Rec2_T0, Rec6_T0, Rec9_T0, Rec2_DJ32, Rec6_DJ32, Rec9_DJ32, Rec6_Rec2, Rec9_Rec2, Rec2_Rec6, Rec9_Rec6, Rec2_Rec9, Rec6_Rec9)
name the dataframes in the list
names(listRes) <- c("DJ21_T0", "DJ24_T0", "DJ29_T0", "DJ32_T0", "DJ24_DJ21", "DJ29_DJ21", "DJ32_DJ21", "DJ21_DJ24", "DJ29_DJ24", "DJ32_DJ24", "DJ21_DJ29", "DJ24_DJ29", "DJ32_DJ29", "DJ21_DJ32", "DJ24_DJ32", "DJ29_DJ32", "Rec2_T0", "Rec6_T0", "Rec9_T0", "Rec2_DJ32", "Rec6_DJ32", "Rec9_DJ32", "Rec6_Rec2", "Rec9_Rec2", "Rec2_Rec6", "Rec9_Rec6", "Rec2_Rec9", "Rec6_Rec9")
define the function (here with export in .csv)
spliteOverUnder <- function(res, nm){ out1 <- assign(paste(nm,"_Overexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) > 1),], pos=1) out2 <- assign(paste(nm,"_Underexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) < -1),], pos=1) PATH <- "/home/datawork-lpi-ecoscopa-s/TIPTOP_rnaseq/5-dge/2-rstudio/6-resOrdered/Over_Under/" write.table(out1, file = paste(PATH,nm,"_Overexpr.csv", sep=""), row.names=FALSE , col.names=TRUE, sep="\t", dec=".", quote=FALSE) write.table(out2, file = paste(PATH,nm,"_Underexpr.csv", sep=""), row.names=FALSE , col.names=TRUE, sep="\t", dec=".", quote=FALSE) }
call the function in for loop
for (i in 1:length(listRes)){ nm <- names(listRes[i]) spliteOverUnder(listRes[[i]],nm) }