Tidyverse column-wise differences-CodePudding

Suppose I have a data frame like this:

df = data.frame(preA = c(1,2,3),preB = c(3,4,5),postA = c(6,7,8),postB = c(9,8,4))

I want to add columns having column-wise differences, that is:

diffA = postA - preA
diffB = postB - preB

and so on...
Is there an efficient way to do this in tidyverse?

CodePudding user response：

The way to go with dplyr and tidyr:

library(dplyr)
library(tidyr)
df %>% 
  mutate(id = 1:n()) %>% 
  pivot_longer(-id,
               names_to = c("pre_post", ".value"),
               names_pattern = "(pre|post)(.*)") %>% 
  group_by(id) %>% 
  mutate(across(A:B, diff, .names = "diff{col}")) %>% 
  pivot_wider(names_from = pre_post, values_from = c(A, B),
              names_glue = '{pre_post}{.value}') %>% 
  select(id, starts_with("pre"), starts_with("post"), starts_with("diff"))

#      id  preA  preB postA postB diffA diffB
# 1     1     1     3     6     9     5     6
# 2     2     2     4     7     8     5     4
# 3     3     3     5     8     4     5    -1

A shorter but less flexible was with dplyover::across2:

library(dplyr)
library(dplover)
df %>% 
  #relocate(sort(colnames(.))) %>% 
  mutate(across2(starts_with("post"), starts_with("pre"), `-`,
                 .names = "diff{idx}"))

#      preA  preB postA postB diff1 diff2
# 1       1     3     6     9     5     6
# 2       2     4     7     8     5     4
# 3       3     5     8     4     5    -1

CodePudding user response：

You can do this with two uses of across(), creating new variables with the first use and subtracting the second. This also assumes your columns are in order.

df %>%
  mutate(across(starts_with("post"), .names = "diff{sub('post', '', .col)}") - across(starts_with("pre")))

  preA preB postA postB diffA diffB
1    1    3     6     9     5     6
2    2    4     7     8     5     4
3    3    5     8     4     5    -1

CodePudding user response：

A few more solutions. My favourite is the first one demonstrated here - I think it's the cleanest and most debuggable:

# Setup:
library(dplyr, warn.conflicts = FALSE)
library(glue)

df <- data.frame(
  preA = c(1,2,3), 
  preB = c(3,4,5), 
  postA = c(6,7,8), 
  postB = c(9,8,4)
)

Method 1: Using expressions:

This is my favourite approach. I think it's very readable, and I think it should be reasonably fast compared to solutions using across():

cols         <- c("A", "B")
exprs        <- glue("post{cols} - pre{cols}")
names(exprs) <- glue("diff{cols}")

df |> 
  mutate(!!!rlang::parse_exprs(exprs))
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 2: Using `mutate()` `across()` `get()`:

Personally, I don't like this sort of thing because I think it's really hard to read:

df |> 
  mutate(across(
    starts_with("post"),
    ~ .x - get(stringr::str_replace_all(cur_column(), "^post", "pre")),
    .names = "diff{stringr::str_remove(.col, '^post')}"
  ))
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 3: Using base subsetting:

The main advantage here is that you don't need any packages (you can use paste0() instead of glue()), IMO it's also pretty readable. But I don't like that it doesn't play well with |>:

cols <- c("A", "B")
df2  <- df
df2[glue("diff{cols}")] <- df2[glue("post{cols}")] - df2[glue("pre{cols}")]

df2
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 1: Using expressions:

Method 2: Using mutate() across() get():

Method 3: Using base subsetting:

Method 2: Using `mutate()` `across()` `get()`: