Suppose I have a data frame like this:
df = data.frame(preA = c(1,2,3),preB = c(3,4,5),postA = c(6,7,8),postB = c(9,8,4))
I want to add columns having column-wise differences, that is:
diffA = postA - preA
diffB = postB - preB
and so on...
Is there an efficient way to do this in tidyverse?
CodePudding user response:
The way to go with dplyr
and tidyr
:
library(dplyr)
library(tidyr)
df %>%
mutate(id = 1:n()) %>%
pivot_longer(-id,
names_to = c("pre_post", ".value"),
names_pattern = "(pre|post)(.*)") %>%
group_by(id) %>%
mutate(across(A:B, diff, .names = "diff{col}")) %>%
pivot_wider(names_from = pre_post, values_from = c(A, B),
names_glue = '{pre_post}{.value}') %>%
select(id, starts_with("pre"), starts_with("post"), starts_with("diff"))
# id preA preB postA postB diffA diffB
# 1 1 1 3 6 9 5 6
# 2 2 2 4 7 8 5 4
# 3 3 3 5 8 4 5 -1
A shorter but less flexible was with dplyover::across2
:
library(dplyr)
library(dplover)
df %>%
#relocate(sort(colnames(.))) %>%
mutate(across2(starts_with("post"), starts_with("pre"), `-`,
.names = "diff{idx}"))
# preA preB postA postB diff1 diff2
# 1 1 3 6 9 5 6
# 2 2 4 7 8 5 4
# 3 3 5 8 4 5 -1
CodePudding user response:
You can do this with two uses of across()
, creating new variables with the first use and subtracting the second. This also assumes your columns are in order.
df %>%
mutate(across(starts_with("post"), .names = "diff{sub('post', '', .col)}") - across(starts_with("pre")))
preA preB postA postB diffA diffB
1 1 3 6 9 5 6
2 2 4 7 8 5 4
3 3 5 8 4 5 -1
CodePudding user response:
A few more solutions. My favourite is the first one demonstrated here - I think it's the cleanest and most debuggable:
# Setup:
library(dplyr, warn.conflicts = FALSE)
library(glue)
df <- data.frame(
preA = c(1,2,3),
preB = c(3,4,5),
postA = c(6,7,8),
postB = c(9,8,4)
)
Method 1: Using expressions:
This is my favourite approach. I think it's very readable, and I think it should be reasonably fast compared to solutions using across()
:
cols <- c("A", "B")
exprs <- glue("post{cols} - pre{cols}")
names(exprs) <- glue("diff{cols}")
df |>
mutate(!!!rlang::parse_exprs(exprs))
#> preA preB postA postB diffA diffB
#> 1 1 3 6 9 5 6
#> 2 2 4 7 8 5 4
#> 3 3 5 8 4 5 -1
Method 2: Using mutate()
across()
get()
:
Personally, I don't like this sort of thing because I think it's really hard to read:
df |>
mutate(across(
starts_with("post"),
~ .x - get(stringr::str_replace_all(cur_column(), "^post", "pre")),
.names = "diff{stringr::str_remove(.col, '^post')}"
))
#> preA preB postA postB diffA diffB
#> 1 1 3 6 9 5 6
#> 2 2 4 7 8 5 4
#> 3 3 5 8 4 5 -1
Method 3: Using base subsetting:
The main advantage here is that you don't need any packages (you can use paste0()
instead of glue()
), IMO it's also pretty readable. But I don't like that it doesn't play well with |>
:
cols <- c("A", "B")
df2 <- df
df2[glue("diff{cols}")] <- df2[glue("post{cols}")] - df2[glue("pre{cols}")]
df2
#> preA preB postA postB diffA diffB
#> 1 1 3 6 9 5 6
#> 2 2 4 7 8 5 4
#> 3 3 5 8 4 5 -1