I have a list of 58 dataframes under the list named nafilt_persample.ngsrep. Inside it are 58 df, named according to individual IDs: SVT_01...58. Each df contains 15 columns with either characters or numbers like:
Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage VAF
1: SVT_01 DNMT3A chr2 25464495 25464495 SNP Missense_Mutation 2835 0.011
2: SVT_01 JAK2 chr9 5073770 5073770 SNP Missense_Mutation 2533 0.028
cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads
1: c.2018G>T p.Gly673Val C A 2808 27
2: c.1849G>T p.V617F G T 2455 78
I need to add to each df in the list two columns lCI and uCI with values coming from a second list that is ordered according to the same ID, (SVT_) and gene and looks like this (called cint):
$DNMT3A
[1] 0.006285366 0.013826599
attr(,"conf.level")
[1] 0.95
$JAK2
[1] 0.02441547 0.03828421
attr(,"conf.level")
[1] 0.95
I would like to obtain a result like this:
Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage VAF
1: SVT_01 DNMT3A chr2 25464495 25464495 SNP Missense_Mutation 2835 0.011
2: SVT_01 JAK2 chr9 5073770 5073770 SNP Missense_Mutation 2533 0.028
cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads lCI uCI
1: c.2018G>T p.Gly673Val C A 2808 27 0.06 0.013
2: c.1849G>T p.V617F G T 2455 78 0.024 0.038
So far I have tried this but without success:
merged.list <- list()
for (i in names(nafilt_persample.ngsrep)){ for (k in nafilt_persample.ngsrep[[i]]$Hugo_Symbol){
merged.list[[i]] <- cbind(nafilt_persample.ngsrep[[i]], cint[[i]][[k]][1], cint[[i]][[k]][2])
}
}
The error here is that despite the two columns are added, only values from the last cycle item are added, So in the example of SVT_01 shown above this is the result:
Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage VAF
1: SVT_01 DNMT3A chr2 25464495 25464495 SNP Missense_Mutation 2835 0.011
2: SVT_01 JAK2 chr9 5073770 5073770 SNP Missense_Mutation 2533 0.028
cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads lCI uCI
1: c.2018G>T p.Gly673Val C A 2808 27 0.024 0.038
2: c.1849G>T p.V617F G T 2455 78 0.024 0.038
That is: the CI of JAK2 is duplicated onto the DNMT3A row. How can I fix this? Hope I provided enough info
CodePudding user response:
We could do
nafilt_persample.ngsrep <- Map(function(dat, nm),
{
dat[c("lCI", "uCI")] <- nm[dat$Hugo_Symbol]
dat
},
nafilt_persample.ngsrep, cint)
Or with for
loop
for(nm in names(nafilt_persample.ngsrep))
{
nafilt_persample.ngsrep[[nm]][c("lCI", "uCI")] <-
cint[[nm]][nafilt_persample.ngsrep[[nm]]$Hugo_Symbol]
}
CodePudding user response:
Here is another option. I would recommend making a small reproducible example in the future. Here is one:
library(tidyverse)
#example data
nafilt_persample.ngsrep <- map(1:3, ~tibble(Tumor_Sample_Barcode =c(glue::glue("SVT_0{.x}")),
Hugo_Symbol = c("DNMT3A", "JAK2"))) |>
`names<-`(paste0("SVT_0", 1:3))
set.seed(32)
cint <- map(1:3, ~ list(DNMT3A = c(runif(1, 0, 0.3), runif(1, 0.3, 1)),
JAK2 = c(runif(1, 0, 0.3), runif(1, 0.3, 1))))
#solution
map2(nafilt_persample.ngsrep, cint,
\(dat, ci){
col_add <- tibble(name = names(ci),
dat = ci) |>
unnest_wider(dat, names_repair = \(x) c("Hugo_Symbol", "lCI", "uCI")) |>
suppressMessages()
left_join(dat, col_add, by = "Hugo_Symbol")
})
#> $SVT_01
#> # A tibble: 2 x 4
#> Tumor_Sample_Barcode Hugo_Symbol lCI uCI
#> <chr> <chr> <dbl> <dbl>
#> 1 SVT_01 DNMT3A 0.152 0.716
#> 2 SVT_01 JAK2 0.243 0.810
#>
#> $SVT_02
#> # A tibble: 2 x 4
#> Tumor_Sample_Barcode Hugo_Symbol lCI uCI
#> <chr> <chr> <dbl> <dbl>
#> 1 SVT_02 DNMT3A 0.0456 0.969
#> 2 SVT_02 JAK2 0.226 0.896
#>
#> $SVT_03
#> # A tibble: 2 x 4
#> Tumor_Sample_Barcode Hugo_Symbol lCI uCI
#> <chr> <chr> <dbl> <dbl>
#> 1 SVT_03 DNMT3A 0.202 0.571
#> 2 SVT_03 JAK2 0.197 0.525