R how to assign multiple variables on a for loop?-CodePudding

Hello this is the first time for me using for loops in R and I'm trying to figure out how to create multiple variables automatically using this loop.

I want to run the following command multiple times changing the gene involved

gene1_row_quantity_sample1 =  sample_1 %>%
dplyr::filter(grepl("gene1",gene_type)) %>%
nrow()

on the above mentioned code I have two variables: genes and sample. Samples are stored on a list:

my_list = list(
sample_1 = read.table(file = "S01.tsv", sep = "\t", header = F),
sample_1 = read.table(file = "S02.tsv", sep = "\t", header = F),
sample_1 = read.table(file = "S03.tsv", sep = "\t", header = F)
...
)

and genes are stored on a concatenation:

genes = c("gene1","gene2","gene3"...)

So how do I apply on a for loop the first code in a way that I can retrieve and store the (gene x sample) variables instead of doing it manually?

Desired output:

gene1_row_quantity_sample1

"number of rows"

gene2_row_quantity_sample1

"number of rows"

gene3_row_quantity_sample1

"number of rows"

gene1_row_quantity_sample2

"number of rows"

gene2_row_quantity_sample2

"number of rows"

gene3_row_quantity_sample2

"number of rows"

gene1_row_quantity_sample3

"number of rows"

gene2_row_quantity_sample3

"number of rows"

gene3_row_quantity_sample3

"number of rows"

Thanks for your time

CodePudding user response：

I'm not sure if this works that you did not provide some reproducile example,

but you may try

lapply(names(my_list), function(x) {
  for(i in genes) {
    y <- my_list[[x]] %>%
      dplyr::filter(grepl(i, gene_type)) %>%
      nrow()
    print(paste(i, x, y))
  }
})

CodePudding user response：

library(tidyverse)

map(my_list ~count(., gene_type))

CodePudding user response：

Your current script would count "gene3" and "gene33" together, i.e. the "grepl" would be TRUE for "gene3" if gene_type is "gene3" or "gene33" or "gene3a":

gene1_row_quantity_sample1 = sample_1 %>%
dplyr::filter(grepl("gene3",gene_type)) %>%
nrow()

Is this the desired outcome? Or do you want gene_type == "gene3" and genetype == "gene33" to be counted separately?

If you want each gene_type in genes to be unique, I think the best approach would be to read in all samples, group_by sample/gene_type, filter out 'gene_type not in genes' (i.e. filtering out "gene33"), then count the number of occurrences, e.g.

~/Desktop/S01.tsv
gene_type,count
gene1,22
gene1,23
gene2,34
gene3,33
gene3,34


~/Desktop/S02.tsv
gene_type,count
gene1,22
gene2,23
gene2,34
gene2,33
gene3,34

~/Desktop/S03.tsv
gene_type,count
gene2,22
gene2,23
gene3,34
gene3,33
gene33,34

library(tidyverse)
#install.packages("vroom")
library(vroom)
#install.packages("fs")
library(fs)

setwd("~/Desktop")

filelist <- dir_ls(glob = "*.tsv")

df <- vroom(filelist, id = "Sample")
#> Rows: 15 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): gene_type
#> dbl (1): count
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
genes = c("gene1","gene2","gene3")

df %>%
  group_by(Sample, gene_type) %>%
  filter(gene_type %in% genes) %>%
  summarise(n = n())
#> `summarise()` has grouped output by 'Sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 8 × 3
#> # Groups:   Sample [3]
#>   Sample  gene_type     n
#>   <chr>   <chr>     <int>
#> 1 S01.tsv gene1         2
#> 2 S01.tsv gene2         1
#> 3 S01.tsv gene3         2
#> 4 S02.tsv gene1         1
#> 5 S02.tsv gene2         3
#> 6 S02.tsv gene3         1
#> 7 S03.tsv gene2         2
#> 8 S03.tsv gene3         2

# And, for all gene_types (i.e. inc "gene33"):
df %>%
  group_by(Sample, gene_type) %>%
  summarise(n = n())
#> `summarise()` has grouped output by 'Sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 9 × 3
#> # Groups:   Sample [3]
#>   Sample  gene_type     n
#>   <chr>   <chr>     <int>
#> 1 S01.tsv gene1         2
#> 2 S01.tsv gene2         1
#> 3 S01.tsv gene3         2
#> 4 S02.tsv gene1         1
#> 5 S02.tsv gene2         3
#> 6 S02.tsv gene3         1
#> 7 S03.tsv gene2         2
#> 8 S03.tsv gene3         2
#> 9 S03.tsv gene33        1

^{Created on 2022-06-09 by the reprex package (v2.0.1)}