mutate dynamically columns based on other columns-CodePudding

I would like to mutate new columns with residuals by solving an easy equation of linear regression.

Let's say that I have the table below which contains more columns with the same structure. Each linear function has the same dependent variable.

Dependent_VAR	X_Independent	X_Coefficient	X_Intercept	Y_Intependent	Y_Coefficient	Y_Intercept
1200	2000	3000	4500	500	750	600
3100	3000	3000	4500	670	750	600
52222	55500	3000	4500	700	750	600
400	456	3000	4500	800	750	600

I want to find the residual for each column name of the dataframe by creating new columns with the new name name_Residual like X_Residual = X_Independent-(X_Intercept X_Coefficient*Dependent_VAR). I would prefer a solution using dplyr.

CodePudding user response：

Another option would be to first reshape your data to tidy format using tidyr::pivot_longer. Afterwards computing the residuals is pretty straightforward:

library(dplyr)
library(tidyr)

dat |> 
  pivot_longer(-Dependent_VAR, names_to = c("name", ".value"), names_sep = "_") |> 
  mutate(Residual = Independent - (Intercept   Dependent_VAR * Coefficient))
#> # A tibble: 8 × 6
#>   Dependent_VAR name  Independent Coefficient Intercept   Residual
#>           <int> <chr>       <int>       <int>     <int>      <int>
#> 1          1200 X            2000        3000      4500   -3602500
#> 2          1200 Y             500         750       600    -900100
#> 3          3100 X            3000        3000      4500   -9301500
#> 4          3100 Y             670         750       600   -2324930
#> 5         52222 X           55500        3000      4500 -156615000
#> 6         52222 Y             700         750       600  -39166400
#> 7           400 X             456        3000      4500   -1204044
#> 8           400 Y             800         750       600    -299800

And if necessary you could of course reshape back to wide afterwards:

dat |> 
  pivot_longer(-Dependent_VAR, names_to = c("name", ".value"), names_sep = "_") |> 
  mutate(Residual = Independent - (Intercept   Dependent_VAR * Coefficient)) |> 
  pivot_wider(names_from = name, values_from = !c(Dependent_VAR, name))
#> # A tibble: 4 × 9
#>   Dependent_VAR Indepe…¹ Indep…² Coeff…³ Coeff…⁴ Inter…⁵ Inter…⁶ Resid…⁷ Resid…⁸
#>           <int>    <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>
#> 1          1200     2000     500    3000     750    4500     600 -3.60e6 -9.00e5
#> 2          3100     3000     670    3000     750    4500     600 -9.30e6 -2.32e6
#> 3         52222    55500     700    3000     750    4500     600 -1.57e8 -3.92e7
#> 4           400      456     800    3000     750    4500     600 -1.20e6 -3.00e5
#> # … with abbreviated variable names ¹Independent_X, ²Independent_Y,
#> #   ³Coefficient_X, ⁴Coefficient_Y, ⁵Intercept_X, ⁶Intercept_Y, ⁷Residual_X,
#> #   ⁸Residual_Y

DATA

dat <- data.frame(
  Dependent_VAR = c(1200L, 3100L, 52222L, 400L),
  X_Independent = c(2000L, 3000L, 55500L, 456L),
  X_Coefficient = c(3000L, 3000L, 3000L, 3000L),
    X_Intercept = c(4500L, 4500L, 4500L, 4500L),
  Y_Independent = c(500L, 670L, 700L, 800L),
  Y_Coefficient = c(750L, 750L, 750L, 750L),
    Y_Intercept = c(600L, 600L, 600L, 600L)
)

CodePudding user response：

Here is a possible approach, it uses !!sym(var) to use the text stored in var as a variable name. There are several other ways to do the same thing, see the [Programming with dplyr vignette][1].

prefix_list = c("X_","Y_")

for(prefix in prefix_list){
  this_independent = paste0(prefix,"Independent")
  this_coefficient = paste0(prefix,"Coefficient")
  this_intercept = paste0(prefix,"Intercept")
  this_interdependent = paste0(prefix,"Interpendent")
  this_residual = paste0(prefix,"Residual")

  df = df %>%
    mutate(!!sym(this_residual) := !!sym(this_independent) - !!sym(this_intercept) - !!sym(this_coefficient) * !!sym(this_interdepdendent))
}


  [1]: https://dplyr.tidyverse.org/articles/programming.html