Home > Software engineering >  How can I calculate the sum of the column wise differences using dplyr
How can I calculate the sum of the column wise differences using dplyr

Time:04-13

Despite using R and dplyr on a regular basis, I encountered the issue of not being able to calculate the sum of the absolute differences between all columns:

sum_diff=ABS(A-B) ABS(B-C) ABS(C-D)...

A B C D sum_diff
1 2 3 4 3
2 1 3 4 4
1 2 1 1 2
4 1 2 1 5

I know I could iterate using a for loop over all columns, but given the size of my data frame, I prefer a more elegant and fast solution.

Any help?

Thank you

CodePudding user response:

We may remove the first and last columns, get the difference, and use rowSums on the absolute values in base R. This could be very efficient compared to a package solution

df1$sum_diff <- rowSums(abs(df1[-ncol(df1)] - df1[-1]))

-output

> df1
  A B C D sum_diff
1 1 2 3 4        3
2 2 1 3 4        4
3 1 2 1 1        2
4 4 1 2 1        5

Or another option is rowDiffs from matrixStats

library(matrixStats)
rowSums(abs(rowDiffs(as.matrix(df1))))
[1] 3 4 2 5

data

df1 <- structure(list(A = c(1L, 2L, 1L, 4L), B = c(2L, 1L, 2L, 1L), 
    C = c(3L, 3L, 1L, 2L), D = c(4L, 4L, 1L, 1L)), row.names = c(NA, 
-4L), class = "data.frame")

CodePudding user response:

Daata from akrun (many thanks)!

This is complicated the idea is to generate a list of the combinations, I tried it with combn but then I get all possible combinations. So I created by hand.

With this combinations we then could use purrrs map_dfc and do some data wrangling after that:

library(tidyverse)

combinations <-list(c("A", "B"), c("B", "C"), c("C","D"))

purrr::map_dfc(combinations, ~{df <- tibble(a=data[[.[[1]]]]-data[[.[[2]]]]) 
names(df) <- paste0(.[[1]],"_v_",.[[2]])
df}) %>% 
  transmute(sum_diff = rowSums(abs(.))) %>% 
  bind_cols(data)
  sum_diff     A     B     C     D
     <dbl> <int> <int> <int> <int>
1        3     1     2     3     4
2        4     2     1     3     4
3        2     1     2     1     1
4        5     4     1     2     1

data:

data <- structure(list(A = c(1L, 2L, 1L, 4L), B = c(2L, 1L, 2L, 1L), 
    C = c(3L, 3L, 1L, 2L), D = c(4L, 4L, 1L, 1L)), row.names = c(NA, 
-4L), class = "data.frame")
  • Related