How can I calculate the sum of the column wise differences using dplyr-CodePudding

Despite using R and dplyr on a regular basis, I encountered the issue of not being able to calculate the sum of the absolute differences between all columns:

sum_diff=ABS(A-B) ABS(B-C) ABS(C-D)...

A	B	C	D	sum_diff
1	2	3	4	3
2	1	3	4	4
1	2	1	1	2
4	1	2	1	5

I know I could iterate using a for loop over all columns, but given the size of my data frame, I prefer a more elegant and fast solution.

Any help?

Thank you

CodePudding user response：

We may remove the first and last columns, get the difference, and use rowSums on the absolute values in base R. This could be very efficient compared to a package solution

df1$sum_diff <- rowSums(abs(df1[-ncol(df1)] - df1[-1]))

-output

> df1
  A B C D sum_diff
1 1 2 3 4        3
2 2 1 3 4        4
3 1 2 1 1        2
4 4 1 2 1        5

Or another option is rowDiffs from matrixStats

library(matrixStats)
rowSums(abs(rowDiffs(as.matrix(df1))))
[1] 3 4 2 5

data

df1 <- structure(list(A = c(1L, 2L, 1L, 4L), B = c(2L, 1L, 2L, 1L), 
    C = c(3L, 3L, 1L, 2L), D = c(4L, 4L, 1L, 1L)), row.names = c(NA, 
-4L), class = "data.frame")

CodePudding user response：

Daata from akrun (many thanks)!

This is complicated the idea is to generate a list of the combinations, I tried it with combn but then I get all possible combinations. So I created by hand.

With this combinations we then could use purrrs map_dfc and do some data wrangling after that:

library(tidyverse)

combinations <-list(c("A", "B"), c("B", "C"), c("C","D"))

purrr::map_dfc(combinations, ~{df <- tibble(a=data[[.[[1]]]]-data[[.[[2]]]]) 
names(df) <- paste0(.[[1]],"_v_",.[[2]])
df}) %>% 
  transmute(sum_diff = rowSums(abs(.))) %>% 
  bind_cols(data)

  sum_diff     A     B     C     D
     <dbl> <int> <int> <int> <int>
1        3     1     2     3     4
2        4     2     1     3     4
3        2     1     2     1     1
4        5     4     1     2     1

data:

data <- structure(list(A = c(1L, 2L, 1L, 4L), B = c(2L, 1L, 2L, 1L), 
    C = c(3L, 3L, 1L, 2L), D = c(4L, 4L, 1L, 1L)), row.names = c(NA, 
-4L), class = "data.frame")