Home > Software design >  When using mutate for several columns in dplyr, how can I reference to another data frame by row nam
When using mutate for several columns in dplyr, how can I reference to another data frame by row nam

Time:12-03

I need a push to the right direction... or a hint where to look further. I have a large dataframe containing results of analyses for samples - a key column and then several columns for analytical targets. I managed to filter certain conditions, summarise some statistics and store them in a second dataframe. (dft2_stat_norm)

Now I want to column-wise divide all target values of the original (unfiltered) tibble (e. g. below "dft_Resp_norm") by the corresponding (row) value stored in the second tibble. I have all targets stored in a vector "analytes". The column and row are linked by the target descriptions, e. g. "G444".

  1. can you suggest a solution?
  2. can you suggest a source (comprehensive for lateral entrants ;)) to read further on this?

Thanks!

> dft_Resp_norm

# A tibble: 39 × 7

Datum      Lfd_Nr Probe  cond     G444   G448   S453
\<chr\>       \<dbl\> \<chr\>  \<chr\>   \<dbl\>  \<dbl\>  \<dbl\>
1 09.01.2020     16 NK     NK    0.00586 0.0591 0.0594
2 04.02.2020     37 NK     NK    0.00661 0.0609 0.0944
3 12.02.2020     57 NK     NK    0.00611 0.0674 0.116
4 13.03.2020     67 NK-2   NK    0.00122 0.0678 0.115
5 13.03.2020     68 NK-007 NK    0.0138  0.115  0.359
> dft2_stat_norm

# A tibble: 3 × 2
  An_Names median_Resp_norm
  <chr>               <dbl>
1 G444              0.00678
2 G448              0.0696 
3 S453              0.126  

I tried it directly ...

dft_MOM_full <- df_Resp_full %>%
  select(1:7) %>%
  mutate(across(any_of(analytes); -?-
  )

but I can link the column name to the corresponding row in the second tibble, and I tried it by grouping ...

dft_MOM_full <- df_Resp_full %>%
  select(1:7) %>%
  pivot_longer(any_of(analytes),
               names_to = "target",
               values_to = "MOM") %>%
  mutate(MOM = MOM / -?-        
  )

... but I didn't manage to find out how to link the group name to the second tibble

CodePudding user response:

Using a left_join you could do:

library(dplyr)
library(tidyr)

analytes <- dft2_stat_norm$An_Names

dft_MOM_full <- dft_Resp_norm %>%
  pivot_longer(any_of(analytes),
               names_to = "target",
               values_to = "MOM") |> 
  left_join(dft2_stat_norm, by = c("target" = "An_Names")) |> 
  mutate(MOM = MOM / median_Resp_norm)

dft_MOM_full
#> # A tibble: 15 × 7
#>    Datum      Lfd_Nr Probe  cond  target   MOM median_Resp_norm
#>    <chr>       <int> <chr>  <chr> <chr>  <dbl>            <dbl>
#>  1 09.01.2020     16 NK     NK    G444   0.864          0.00678
#>  2 09.01.2020     16 NK     NK    G448   0.849          0.0696 
#>  3 09.01.2020     16 NK     NK    S453   0.471          0.126  
#>  4 04.02.2020     37 NK     NK    G444   0.975          0.00678
#>  5 04.02.2020     37 NK     NK    G448   0.875          0.0696 
#>  6 04.02.2020     37 NK     NK    S453   0.749          0.126  
#>  7 12.02.2020     57 NK     NK    G444   0.901          0.00678
#>  8 12.02.2020     57 NK     NK    G448   0.968          0.0696 
#>  9 12.02.2020     57 NK     NK    S453   0.921          0.126  
#> 10 13.03.2020     67 NK-2   NK    G444   0.180          0.00678
#> 11 13.03.2020     67 NK-2   NK    G448   0.974          0.0696 
#> 12 13.03.2020     67 NK-2   NK    S453   0.913          0.126  
#> 13 13.03.2020     68 NK-007 NK    G444   2.04           0.00678
#> 14 13.03.2020     68 NK-007 NK    G448   1.65           0.0696 
#> 15 13.03.2020     68 NK-007 NK    S453   2.85           0.126
  • Related