Hello this is the first time for me using for loops in R and I'm trying to figure out how to create multiple variables automatically using this loop.
I want to run the following command multiple times changing the gene involved
gene1_row_quantity_sample1 = sample_1 %>%
dplyr::filter(grepl("gene1",gene_type)) %>%
nrow()
on the above mentioned code I have two variables: genes and sample. Samples are stored on a list:
my_list = list(
sample_1 = read.table(file = "S01.tsv", sep = "\t", header = F),
sample_1 = read.table(file = "S02.tsv", sep = "\t", header = F),
sample_1 = read.table(file = "S03.tsv", sep = "\t", header = F)
...
)
and genes are stored on a concatenation:
genes = c("gene1","gene2","gene3"...)
So how do I apply on a for loop the first code in a way that I can retrieve and store the (gene x sample) variables instead of doing it manually?
Desired output:
gene1_row_quantity_sample1
"number of rows"
gene2_row_quantity_sample1
"number of rows"
gene3_row_quantity_sample1
"number of rows"
gene1_row_quantity_sample2
"number of rows"
gene2_row_quantity_sample2
"number of rows"
gene3_row_quantity_sample2
"number of rows"
gene1_row_quantity_sample3
"number of rows"
gene2_row_quantity_sample3
"number of rows"
gene3_row_quantity_sample3
"number of rows"
Thanks for your time
CodePudding user response:
I'm not sure if this works that you did not provide some reproducile example,
but you may try
lapply(names(my_list), function(x) {
for(i in genes) {
y <- my_list[[x]] %>%
dplyr::filter(grepl(i, gene_type)) %>%
nrow()
print(paste(i, x, y))
}
})
CodePudding user response:
library(tidyverse)
map(my_list ~count(., gene_type))
CodePudding user response:
Your current script would count "gene3" and "gene33" together, i.e. the "grepl" would be TRUE for "gene3" if gene_type is "gene3" or "gene33" or "gene3a":
gene1_row_quantity_sample1 = sample_1 %>%
dplyr::filter(grepl("gene3",gene_type)) %>%
nrow()
Is this the desired outcome? Or do you want gene_type == "gene3" and genetype == "gene33" to be counted separately?
If you want each gene_type in genes to be unique, I think the best approach would be to read in all samples, group_by sample/gene_type, filter out 'gene_type not in genes' (i.e. filtering out "gene33"), then count the number of occurrences, e.g.
~/Desktop/S01.tsv
gene_type,count
gene1,22
gene1,23
gene2,34
gene3,33
gene3,34
~/Desktop/S02.tsv
gene_type,count
gene1,22
gene2,23
gene2,34
gene2,33
gene3,34
~/Desktop/S03.tsv
gene_type,count
gene2,22
gene2,23
gene3,34
gene3,33
gene33,34
library(tidyverse)
#install.packages("vroom")
library(vroom)
#install.packages("fs")
library(fs)
setwd("~/Desktop")
filelist <- dir_ls(glob = "*.tsv")
df <- vroom(filelist, id = "Sample")
#> Rows: 15 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): gene_type
#> dbl (1): count
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
genes = c("gene1","gene2","gene3")
df %>%
group_by(Sample, gene_type) %>%
filter(gene_type %in% genes) %>%
summarise(n = n())
#> `summarise()` has grouped output by 'Sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 8 × 3
#> # Groups: Sample [3]
#> Sample gene_type n
#> <chr> <chr> <int>
#> 1 S01.tsv gene1 2
#> 2 S01.tsv gene2 1
#> 3 S01.tsv gene3 2
#> 4 S02.tsv gene1 1
#> 5 S02.tsv gene2 3
#> 6 S02.tsv gene3 1
#> 7 S03.tsv gene2 2
#> 8 S03.tsv gene3 2
# And, for all gene_types (i.e. inc "gene33"):
df %>%
group_by(Sample, gene_type) %>%
summarise(n = n())
#> `summarise()` has grouped output by 'Sample'. You can override using the
#> `.groups` argument.
#> # A tibble: 9 × 3
#> # Groups: Sample [3]
#> Sample gene_type n
#> <chr> <chr> <int>
#> 1 S01.tsv gene1 2
#> 2 S01.tsv gene2 1
#> 3 S01.tsv gene3 2
#> 4 S02.tsv gene1 1
#> 5 S02.tsv gene2 3
#> 6 S02.tsv gene3 1
#> 7 S03.tsv gene2 2
#> 8 S03.tsv gene3 2
#> 9 S03.tsv gene33 1
Created on 2022-06-09 by the reprex package (v2.0.1)