Home > database >  R mutate across and using two dynamically named columns to calculate result
R mutate across and using two dynamically named columns to calculate result

Time:03-02

Test <- tribble(~Date, ~HCl_Konz, ~HCl_Kenn, ~CO_Konz, ~CO_Kenn,
                   1, 4, "", 4, "",
                   2, 5, "", 1, "",
                   3, 2, "X", 6, "BX",
                   4, 5, "", 4, "",
                   5, 6, "F", 4, "",
                   6, 5, "", 9, "EXr")
Param <- c("HCl", "CO")

The real tibble is much bigger and has several columns like the HCl and CO, but they all follow the same scheme. For all of these columns I want to set the value of HCl_Konz to NA, if the Column HCl_Kenn has at least one of the chars "X" or "F", the same with CO_Konz (if CO_Kenn includes X or F), and all the oder XXX_Konz columns.

I tried the following code, but it quits with the following error.

Test %>% rowwise() %>%
    mutate(across(paste(Param, "_Konz", sep=""), ~ ifelse(str_detect(paste(str_sub(cur_column(),1,-6), "_Kenn", sep=""), "[XF]"), NA_real_, .x)))

The code doesn't throw an error, but the values are not replaced by NA.

tia

CodePudding user response:

  1. You're missing the ~ to mark the ifelse(..) as a function of sorts.
  2. cur_col() not found (for me), should likely be . or .x
  3. You are str_detecting in the name of the _Kenn-equivalent column, not the values in that column; we need to add cur_data()[[..]] as well.

I tend to not use stringr for straight-forward replacements like this, preferring base R:

library(dplyr)
Test %>%
  mutate(
    across(
      paste0(Param, "_Konz"),
      ~ if_else( grepl("[XF]", cur_data()[[ gsub("_Konz", "_Kenn", cur_column()) ]] ),
                .[NA], . )
    )
  )
# # A tibble: 6 x 5
#    Date HCl_Konz HCl_Kenn CO_Konz CO_Kenn
#   <dbl>    <dbl> <chr>      <dbl> <chr>  
# 1     1        4 ""             4 ""     
# 2     2        5 ""             1 ""     
# 3     3       NA "X"           NA "BX"   
# 4     4        5 ""             4 ""     
# 5     5       NA "F"            4 ""     
# 6     6        5 ""            NA "EXr"  

I recommend dplyr::if_else in place of ifelse for several reasons, but it comes with the strict (and safe!) requirement that the true= and false= arguments be precisely the same type. You recognize at least most of this by your use of NA_real_; my use of .[NA] is another way of ensuring that we get the correct NA-variant based on the actual data, allowing this method to work if some of your Params are integer and some are numeric, for example.

An alternative approach (which may help later) is to pivot the data and work with just two columns at a time.

library(tidyr) # pivot_longer
Test %>%
  pivot_longer(
    matches("_(Konz|Kenn)$"),
    names_pattern = "(.*)_(.*)", names_to = c("elem", ".value")
  ) %>%
  mutate(
    Konz = if_else(grepl("[XF]", Kenn), Konz[NA], Konz)
  )
# # A tibble: 12 x 4
#     Date elem   Konz Kenn 
#    <dbl> <chr> <dbl> <chr>
#  1     1 HCl       4 ""   
#  2     1 CO        4 ""   
#  3     2 HCl       5 ""   
#  4     2 CO        1 ""   
#  5     3 HCl      NA "X"  
#  6     3 CO       NA "BX" 
#  7     4 HCl       5 ""   
#  8     4 CO        4 ""   
#  9     5 HCl      NA "F"  
# 10     5 CO        4 ""   
# 11     6 HCl       5 ""   
# 12     6 CO       NA "EXr"

This pivoted format has the advantage of allowing simpler calls to mutate, and (if you plan on plotting this) playing much better with ggplot2's preference for long data.

  • Related