Home > Net >  Tidyverse column-wise differences
Tidyverse column-wise differences

Time:10-15

Suppose I have a data frame like this:

df = data.frame(preA = c(1,2,3),preB = c(3,4,5),postA = c(6,7,8),postB = c(9,8,4))

I want to add columns having column-wise differences, that is:

diffA = postA - preA
diffB = postB - preB 

and so on...
Is there an efficient way to do this in tidyverse?

CodePudding user response:

The way to go with dplyr and tidyr:

library(dplyr)
library(tidyr)
df %>% 
  mutate(id = 1:n()) %>% 
  pivot_longer(-id,
               names_to = c("pre_post", ".value"),
               names_pattern = "(pre|post)(.*)") %>% 
  group_by(id) %>% 
  mutate(across(A:B, diff, .names = "diff{col}")) %>% 
  pivot_wider(names_from = pre_post, values_from = c(A, B),
              names_glue = '{pre_post}{.value}') %>% 
  select(id, starts_with("pre"), starts_with("post"), starts_with("diff"))

#      id  preA  preB postA postB diffA diffB
# 1     1     1     3     6     9     5     6
# 2     2     2     4     7     8     5     4
# 3     3     3     5     8     4     5    -1

A shorter but less flexible was with dplyover::across2:

library(dplyr)
library(dplover)
df %>% 
  #relocate(sort(colnames(.))) %>% 
  mutate(across2(starts_with("post"), starts_with("pre"), `-`,
                 .names = "diff{idx}"))

#      preA  preB postA postB diff1 diff2
# 1       1     3     6     9     5     6
# 2       2     4     7     8     5     4
# 3       3     5     8     4     5    -1

CodePudding user response:

You can do this with two uses of across(), creating new variables with the first use and subtracting the second. This also assumes your columns are in order.

df %>%
  mutate(across(starts_with("post"), .names = "diff{sub('post', '', .col)}") - across(starts_with("pre")))

  preA preB postA postB diffA diffB
1    1    3     6     9     5     6
2    2    4     7     8     5     4
3    3    5     8     4     5    -1

CodePudding user response:

A few more solutions. My favourite is the first one demonstrated here - I think it's the cleanest and most debuggable:

# Setup:
library(dplyr, warn.conflicts = FALSE)
library(glue)

df <- data.frame(
  preA = c(1,2,3), 
  preB = c(3,4,5), 
  postA = c(6,7,8), 
  postB = c(9,8,4)
)

Method 1: Using expressions:

This is my favourite approach. I think it's very readable, and I think it should be reasonably fast compared to solutions using across():

cols         <- c("A", "B")
exprs        <- glue("post{cols} - pre{cols}")
names(exprs) <- glue("diff{cols}")

df |> 
  mutate(!!!rlang::parse_exprs(exprs))
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 2: Using mutate() across() get():

Personally, I don't like this sort of thing because I think it's really hard to read:

df |> 
  mutate(across(
    starts_with("post"),
    ~ .x - get(stringr::str_replace_all(cur_column(), "^post", "pre")),
    .names = "diff{stringr::str_remove(.col, '^post')}"
  ))
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1

Method 3: Using base subsetting:

The main advantage here is that you don't need any packages (you can use paste0() instead of glue()), IMO it's also pretty readable. But I don't like that it doesn't play well with |>:

cols <- c("A", "B")
df2  <- df
df2[glue("diff{cols}")] <- df2[glue("post{cols}")] - df2[glue("pre{cols}")]

df2
#>   preA preB postA postB diffA diffB
#> 1    1    3     6     9     5     6
#> 2    2    4     7     8     5     4
#> 3    3    5     8     4     5    -1
  • Related