R : How loop a function creating named dataframes according to dataframes contained in a list passed-CodePudding

I made a function that takes a dataframe as argument, and creates two dataframes in output according to a threshold value of one of the columns. These 2 output dataframes are named according to the original input dataframe.

spliteOverUnder <- function(res){
  nm <-deparse(substitute(res))
  assign(paste(nm,"_Overexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) > 1),], pos=1)
  assign(paste(nm,"_Underexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) < -1),], pos=1)
}

The function works correctly. I would like to use a loop on this function so that each of my dataframes gives 2 dataframes according to my criteria, so I created a list that contains my dataframes:

listRes <- list(DJ21_T0, DJ24_T0, DJ29_T0, DJ32_T0,
                DJ24_DJ21, DJ29_DJ21, DJ32_DJ21,
                DJ21_DJ24, DJ29_DJ24, DJ32_DJ24,
                DJ21_DJ29, DJ24_DJ29, DJ32_DJ29,
                DJ21_DJ32, DJ24_DJ32, DJ29_DJ32,
                Rec2_T0, Rec6_T0, Rec9_T0,
                Rec2_DJ32, Rec6_DJ32, Rec9_DJ32,
                Rec6_Rec2, Rec9_Rec2,
                Rec2_Rec6, Rec9_Rec6,
                Rec2_Rec9, Rec6_Rec9)

and used the following code:

for (i in 1:length(listRes)){
  spliteOverUnder(listRes[[i]])
}

But this one returns me the objects listRes[[i]]_Overexpr and listRec[[i]]Underexpr I encounter the same problem when I do the loop like this:

for (i in listRes){
  spliteOverUnder(i)
}

Which gives me the objects i_Overexpr and i_Underexpr.

lapply(listRes, spliteOverUnder) doesn't work either...

How to loop correctly my function and get the objects corresponding to my dataframes ? (DJ21_T0_Overexpr, DJ21_T0_Underexpr, DJ24_T0_Overexpr, DJ24_T0_Underexpr, ... , Rec6_Rec9_Overexpr, Rec6_Rec9_Underexpr)

I think the trick deparse(substitute(res)) used in my function is problematic, giving the created objects the name i or listRes[[i]] rather than giving the name of the dataframe at position i in my listRes dataframe list.

Any help is welcome.

Thanks

CodePudding user response：

Here's a tidyverse solution that avoids the need to explicitly write loops, using map instead. Note at the outset that you could probably do the whole thing using grouped or nested data frames, thus avoiding the need to create the objects. But if you do want to create the objects (or perhaps the starting dfs have different numbers of columns, even if they all have the log2FoldChange column) then you could do something like the following.

First, some setup to make the example reproducible.

library(tidyverse)
set.seed(42722)

## Names of the example data frames we'll create
## are df_1 ... df5
df_names <- paste0("df_", 1:5) %>% 
  set_names()

## We'll make the new dfs by sampling from mtcars
base_df <- as_tibble(mtcars, rownames = "model") %>% 
  select(model, cyl, hp)

## Create 5 new data frame objects in our environment 
df_names %>% 
  walk(~ assign(x = .x,         # each element of df_names in turn
                value = sample_n(base_df, 10), 
                envir = .GlobalEnv))

## Now we have, e.g.
df_1
#> # A tibble: 10 × 3
#>    model               cyl    hp
#>    <chr>             <dbl> <dbl>
#>  1 Chrysler Imperial     8   230
#>  2 Mazda RX4 Wag         6   110
#>  3 Merc 450SE            8   180
#>  4 Porsche 914-2         4    91
#>  5 Toyota Corona         4    97
#>  6 Ford Pantera L        8   264
#>  7 Toyota Corolla        4    65
#>  8 Merc 280C             6   123
#>  9 Duster 360            8   245
#> 10 Merc 230              4    95

Next, get these five data frames and put them in a list, which is where the question starts from.

df_list <- map(df_names, get)

Now, working with this list of data frame, we can split each one into the over/under. If the split criteria were more complex we could write a function to do it. But here we use if_else to create a new column in each data frame based on a threshold value of cyl.

## - a. Create an over_under column in each df in the list, 
##      based on whether `cyl` in that particular df is < 5 or not
## - b. Split on this new column.
## - c. Put all the results into a new list called `split_list`

split_list <- df_list %>% 
  map(~ mutate(., 
               over_under = if_else(.$cyl>5, "over", "under"))) %>% 
    map(~ split(., as.factor(.$over_under)))

Now we have a nested list. Each of df_1 to df_5 is split into an over or under table. We can look at them by e.g.


split_list$df_3$under

#> # A tibble: 6 × 4
#>   model                cyl    hp over_under
#>   <chr>              <dbl> <dbl> <chr>     
#> 1 Hornet 4 Drive         6   110 under     
#> 2 Hornet Sportabout      8   175 under     
#> 3 Maserati Bora          8   335 under     
#> 4 Valiant                6   105 under     
#> 5 Mazda RX4 Wag          6   110 under     
#> 6 Cadillac Fleetwood     8   205 under

This is handy because we can use tab completion in our IDE to investigate the tables in the list.

We could just work with the list like this. Or we could bind them into a big df, by row, assuming they all have the same columns. But the OP wanted them as separate data frame objecs with a suffix _over or _under. So, e.g. to extract all the "over" dfs and make them objects with names df_1_over etc, we can do

split_list %>% 
  map("over") %>%                               # subset to "over" dfs only
  set_names(nm = ~ paste0(.x, "_over")) %>%     # name each list element
  walk2(.x = names(.), #                        # write out each df with its name
        .y = .,
        .f = ~ assign(x = .x,
                value = as_tibble(.y),
                envir = .GlobalEnv))

Now in our environment we have e.g.

df_5_over

#> # A tibble: 3 × 4
#>   model            cyl    hp over_under
#>   <chr>          <dbl> <dbl> <chr>     
#> 1 Porsche 914-2      4    91 over      
#> 2 Toyota Corona      4    97 over      
#> 3 Toyota Corolla     4    65 over

We can get the "under" dfs as objects in the same way.

Again, depending on what was needed it might make more sense to do the whole thing from start to finish using a single tibble and grouping the data as needed. Or, if we know the original dfs all have the same columnar layout, bind them by row into a df indexed by their name, like this:

df_all <- bind_rows(df_list, .id = "id")

df_all

#> # A tibble: 50 × 4
#>    id    model               cyl    hp
#>    <chr> <chr>             <dbl> <dbl>
#>  1 df_1  Chrysler Imperial     8   230
#>  2 df_1  Mazda RX4 Wag         6   110
#>  3 df_1  Merc 450SE            8   180
#>  4 df_1  Porsche 914-2         4    91
#>  5 df_1  Toyota Corona         4    97
#>  6 df_1  Ford Pantera L        8   264
#>  7 df_1  Toyota Corolla        4    65
#>  8 df_1  Merc 280C             6   123
#>  9 df_1  Duster 360            8   245
#> 10 df_1  Merc 230              4    95
#> # … with 40 more rows

From there you can group the big df by id make the over/under measures etc.

CodePudding user response：

Finally, the main problem was to make the distinction between the object and its name, and not to forget that creating a list of dataframes erases the name of these dataframes. The use of the names() function is therefore very useful. Be careful to name the objects in the same order as they are contained in the list.

create the list containing the dataframes

listRes <- list(DJ21_T0, DJ24_T0, DJ29_T0, DJ32_T0, DJ24_DJ21, DJ29_DJ21, DJ32_DJ21, DJ21_DJ24, DJ29_DJ24, DJ32_DJ24, DJ21_DJ29, DJ24_DJ29, DJ32_DJ29, DJ21_DJ32, DJ24_DJ32, DJ29_DJ32, Rec2_T0, Rec6_T0, Rec9_T0, Rec2_DJ32, Rec6_DJ32, Rec9_DJ32, Rec6_Rec2, Rec9_Rec2, Rec2_Rec6, Rec9_Rec6, Rec2_Rec9, Rec6_Rec9)
name the dataframes in the list

names(listRes) <- c("DJ21_T0", "DJ24_T0", "DJ29_T0", "DJ32_T0", "DJ24_DJ21", "DJ29_DJ21", "DJ32_DJ21", "DJ21_DJ24", "DJ29_DJ24", "DJ32_DJ24", "DJ21_DJ29", "DJ24_DJ29", "DJ32_DJ29", "DJ21_DJ32", "DJ24_DJ32", "DJ29_DJ32", "Rec2_T0", "Rec6_T0", "Rec9_T0", "Rec2_DJ32", "Rec6_DJ32", "Rec9_DJ32", "Rec6_Rec2", "Rec9_Rec2", "Rec2_Rec6", "Rec9_Rec6", "Rec2_Rec9", "Rec6_Rec9")
define the function (here with export in .csv)

spliteOverUnder <- function(res, nm){ out1 <- assign(paste(nm,"_Overexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) > 1),], pos=1) out2 <- assign(paste(nm,"_Underexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) < -1),], pos=1) PATH <- "/home/datawork-lpi-ecoscopa-s/TIPTOP_rnaseq/5-dge/2-rstudio/6-resOrdered/Over_Under/" write.table(out1, file = paste(PATH,nm,"_Overexpr.csv", sep=""), row.names=FALSE , col.names=TRUE, sep="\t", dec=".", quote=FALSE) write.table(out2, file = paste(PATH,nm,"_Underexpr.csv", sep=""), row.names=FALSE , col.names=TRUE, sep="\t", dec=".", quote=FALSE) }
call the function in for loop

for (i in 1:length(listRes)){ nm <- names(listRes[i]) spliteOverUnder(listRes[[i]],nm) }