Home > Back-end >  Adding columns/values on a list of dataframes from a second list
Adding columns/values on a list of dataframes from a second list

Time:10-30

I have a list of 58 dataframes under the list named nafilt_persample.ngsrep. Inside it are 58 df, named according to individual IDs: SVT_01...58. Each df contains 15 columns with either characters or numbers like:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27
2:   c.1849G>T        p.V617F                G                 T      2455        78

I need to add to each df in the list two columns lCI and uCI with values coming from a second list that is ordered according to the same ID, (SVT_) and gene and looks like this (called cint):

$DNMT3A
[1] 0.006285366 0.013826599
attr(,"conf.level")
[1] 0.95

$JAK2
[1] 0.02441547 0.03828421
attr(,"conf.level")
[1] 0.95

I would like to obtain a result like this:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads lCI  uCI
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27 0.06  0.013
2:   c.1849G>T        p.V617F                G                 T      2455        78 0.024 0.038

So far I have tried this but without success:

merged.list <- list()

for (i in names(nafilt_persample.ngsrep)){ for (k in nafilt_persample.ngsrep[[i]]$Hugo_Symbol){
  merged.list[[i]] <- cbind(nafilt_persample.ngsrep[[i]], cint[[i]][[k]][1], cint[[i]][[k]][2])
    }
}

The error here is that despite the two columns are added, only values from the last cycle item are added, So in the example of SVT_01 shown above this is the result:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads lCI  uCI
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27 0.024  0.038
2:   c.1849G>T        p.V617F                G                 T      2455        78 0.024 0.038

That is: the CI of JAK2 is duplicated onto the DNMT3A row. How can I fix this? Hope I provided enough info

CodePudding user response:

We could do

nafilt_persample.ngsrep <- Map(function(dat, nm), 
    {
    dat[c("lCI", "uCI")] <- nm[dat$Hugo_Symbol]
    dat
       

   },
    nafilt_persample.ngsrep, cint)

Or with for loop

for(nm in names(nafilt_persample.ngsrep)) 
   {
   nafilt_persample.ngsrep[[nm]][c("lCI", "uCI")] <- 
       cint[[nm]][nafilt_persample.ngsrep[[nm]]$Hugo_Symbol]
   }

CodePudding user response:

Here is another option. I would recommend making a small reproducible example in the future. Here is one:

library(tidyverse)

#example data
nafilt_persample.ngsrep <- map(1:3, ~tibble(Tumor_Sample_Barcode =c(glue::glue("SVT_0{.x}")),
                 Hugo_Symbol = c("DNMT3A", "JAK2"))) |>
  `names<-`(paste0("SVT_0", 1:3))

set.seed(32)
cint <- map(1:3, ~ list(DNMT3A = c(runif(1, 0, 0.3), runif(1, 0.3, 1)),
                        JAK2 = c(runif(1, 0, 0.3), runif(1, 0.3, 1))))


#solution
map2(nafilt_persample.ngsrep, cint, 
    \(dat, ci){
      col_add <- tibble(name = names(ci),
                        dat = ci) |>
        unnest_wider(dat, names_repair = \(x) c("Hugo_Symbol", "lCI",  "uCI")) |>
        suppressMessages()
      
      left_join(dat, col_add, by = "Hugo_Symbol")
    })
#> $SVT_01
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol   lCI   uCI
#>   <chr>                <chr>       <dbl> <dbl>
#> 1 SVT_01               DNMT3A      0.152 0.716
#> 2 SVT_01               JAK2        0.243 0.810
#> 
#> $SVT_02
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol    lCI   uCI
#>   <chr>                <chr>        <dbl> <dbl>
#> 1 SVT_02               DNMT3A      0.0456 0.969
#> 2 SVT_02               JAK2        0.226  0.896
#> 
#> $SVT_03
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol   lCI   uCI
#>   <chr>                <chr>       <dbl> <dbl>
#> 1 SVT_03               DNMT3A      0.202 0.571
#> 2 SVT_03               JAK2        0.197 0.525
  • Related