Home > front end >  R- How do I use a lookup table containing threshold values that vary for different variables (column
R- How do I use a lookup table containing threshold values that vary for different variables (column

Time:10-23

I am trying to streamline the process of auditing chemistry laboratory data. When we encounter data where an analyte is not detected I need to change the recorded result to a value equal to 1/2 of the level of detection (LOD) for the analytical method. I have LOD's contained within another dataframe to be used as a lookup table.

I have multiple columns representing data from different analytical tests, each with it's own unique LOD. Here's an example of the type of data I am working with:

library(tidyverse)
dat <- tibble("Lab_ID" = as.character(seq(1,10,1)),
                 "Tributary" = c('sawmill','paint', 'herring', 'water',
                               'paint', 'sawmill', 'bolt', 'water',
                               'herring', 'sawmill'),
                 "date" = rep(as.POSIXct("2021-10-01 12:00:00"), 10),
                 "TP" = c(1.5,15.7,-2.3,7.6,0.1,45.6,12.2,-0.1,22.2,0.6),
                 "TN" = c(100.3,56.2,-10.5,0.4,-0.3,11.0,45.8,256.0,12.2,144.0),
                 "DOC" = c(56.0,120.3,-10.5,0.2,14.6,489.3,0.3,14.4,54.6,88.8))
dat


detect_level <- tibble("Parameter" = c('TP', 'TN', 'DOC'),
                          'LOD' = c(0.6, 11, 0.3)) %>% 
  mutate(halfLOD=LOD/2)
detect_level

I have poured over multiple other questions with a similar theme:

Change values in multiple columns of a dataframe using a lookup table

R - Match values from multiple columns in a data.frame to a lookup table.

Replace values in multiple columns using different thresholds

and gotten to a point where I have pivoted the data and split it out into a list of dataframes that are specific analytes:

dat %>% 
    pivot_longer(cols = c('TP','TN','DOC')) %>% 
    arrange(name) %>% 
    split(.$name) 

I have tried to apply a function using map(), however I cannot figure out how to integrate the values from the lookup table (detect_level) into my code. If someone could help me continue this pipe, or finish the process to achieve a final product dat2 that should look like this I would appreciate it:

dat2 <- tibble("Lab_ID" = as.character(seq(1,10,1)),
              "Tributary" = c('sawmill','paint', 'herring', 'water',
                              'paint', 'sawmill', 'bolt', 'water',
                              'herring', 'sawmill'),
              "date" = rep(as.POSIXct("2021-10-01 12:00:00"), 10),
              "TP" = c(1.5,15.7,0.3,7.6,0.3,45.6,12.2,0.3,22.2,0.6),
              "TN" = c(100.3,56.2,5.5,5.5,5.5,11.0,45.8,256.0,12.2,144.0),
              "DOC" = c(56.0,120.3,0.15,0.15,14.6,489.3,0.3,14.4,54.6,88.8))

dat2

Another possibility would be from the closest similar question I have found is:

Lookup multiple column from a single table

Here's a snippet of code that I have adapted from this question, however, if you run it you will see that where values exist that are not found in detect_level an NA is returned. Additionally, it does not appear to have worked for $TN or $DOC, even in cases when the $LOD value from detect_level was present.

dat %>% 
    mutate(across(all_of(unique(detect_level$Parameter)),
                  ~ {i1 <- detect_level$Parameter == cur_column()
                  detect_level$LOD[i1][match(., detect_level$LOD)]}))

I am not comfortable at all with the purrr language here and have only adapted this code from the question linked, so I would appreciate if this is the direction an answerer chooses, that they might comment code to explain briefly what is happening "under the hood".

Thank you in advance!

CodePudding user response:

Perhaps this helps

library(dplyr)
dat %>%
   mutate(across(all_of(detect_level$Parameter), 
     ~ pmax(., detect_level$LOD[match(cur_column(), detect_level$Parameter)])))

For the updated case

dat %>%
   mutate(across(all_of(detect_level$Parameter), 
     ~ replace(., . < detect_level$LOD[match(cur_column(), 
    detect_level$Parameter)],detect_level$halfLOD[match(cur_column(), 
    detect_level$Parameter)])))
  • Related