Home > Mobile >  R: in for loop, using mutate to compute the difference between two variables dynamically
R: in for loop, using mutate to compute the difference between two variables dynamically

Time:09-27

Purpose

Suppose I have four variables: Two variables are original variables and the other two variables are the predictions of the original variables. (In actual data, there are a greater number of original variables)

I want to use for loop and mutate to create columns that compute the difference between the original and prediction variable. The sample data and the current approach are following:

Sample data

  set.seed(10000)
  id <- sample(1:20, 100, replace=T)
  set.seed(10001)
  dv.1 <- sample(1:20, 100, replace=T)
  set.seed(10002)
  dv.2 <- sample(1:20, 100, replace=T)
  set.seed(10003)
  pred_dv.1 <- sample(1:20, 100, replace=T)
  set.seed(10004)
  pred_dv.2 <- sample(1:20, 100, replace=T)
  
  d <-   
    data.frame(id, dv.1, dv.2, pred_dv.1, pred_dv.2) 

Current approach (with Error)

 original <- d %>% select(starts_with('dv.'))  %>% names(.) 
  
  pred <- d %>% select(starts_with('pred_dv.')) %>% names(.) 
  
  for (i in 1:length(original)){
    d <-
      d %>% 
      mutate(diff = original[i] - pred[i])
  
    l <- length(d)
    
    colnames(d[l])  <- paste0(original[i], '.diff')

  }

Error: Problem with mutate() input diff. # x non-numeric argument to binary operator # ℹ Input diff is original[i] - pred[i].

CodePudding user response:

d %>% 
  mutate(
    across(
      .cols = starts_with("dv"),
      .fns = ~ . - (get(paste0("pred_",cur_column()))),
      .names = "diff_{.col}"
        )
  )

# A tibble: 100 x 7
     id  dv.1  dv.2 pred_dv.1 pred_dv.2 diff_dv.1 diff_dv.2
   <int> <int> <int>     <int>     <int>    <int>    <int>
 1    15     5     1         5        15        0      -14
 2    13     4     4         5        11       -1       -7
 3    12    20    13         6        13       14        0
 4    20    11     8        13         3       -2        5
 5     9    11    10         7        13        4       -3
 6    13     3     3         6        17       -3      -14
 7     3    12    19         6        17        6        2
 8    19     6     7        11         4       -5        3
 9     6     7    12        19         6      -12        6
10    13    10    15         6         7        4        8
# ... with 90 more rows

CodePudding user response:

Subtraction can be applied on dataframes directly.

So you can create a vector of original column names and another vector of prediction column names and subtract them creating new columns.

orig_var <- grep('^dv', names(d), value = TRUE)
pred_var <- grep('pred', names(d), value = TRUE)
d[paste0(orig_var, '.diff')] <- d[orig_var] - d[pred_var]
d

#    id dv.1 dv.2 pred_dv.1 pred_dv.2 dv.1.diff dv.2.diff
#1   15    5    1         5        15         0       -14
#2   13    4    4         5        11        -1        -7
#3   12   20   13         6        13        14         0
#4   20   11    8        13         3        -2         5
#5    9   11   10         7        13         4        -3
#...
#...
  • Related