Home > Mobile >  R - Calculate the differences in the column values between rows/ observations (all combinations)
R - Calculate the differences in the column values between rows/ observations (all combinations)

Time:10-24

I have an R data frame as in the following example. I wish to calculate the differences in the column values between observations/ rows (all combinations).

my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))

> my_df
# A tibble: 5 x 3
       a     b      c
   <dbl> <dbl>  <dbl>
1 0.0513 0.267 0.846 
2 0.614  0.683 0.937 
3 0.230  0.700 0.0651
4 0.671  0.110 0.901 
5 0.424  0.520 0.817 

I have tried the code below which gives me only the difference between subsequent rows; I want to have all combinations: row2 - row1; row3 - row1; row4 - row1, row5- row1, row3 - row2, row4 - row2, and so on...

Also, the code I wrote does not seem the best to me (!), although it outputs the result I wish, but not for all possible combinations!

my_diff <- as.data.frame(diff(as.matrix(my_df)))
> my_diff
           a           b           c
1  0.5623574  0.41522579  0.09165630
2 -0.3837289  0.01755953 -0.87209740
3  0.4407068 -0.58982681  0.83540813
4 -0.2463205  0.40943495 -0.08358985

I appreciate if someone could provide help in solving my question using R, if possible a using tidy verse options.

Thanks.

CodePudding user response:

Kindly let me know if this is what you were anticipating.

my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))

# Generating the sequence to calculate the combinations
seq1 <- seq(1,nrow(my_df)) 
seq2 <- seq1

# Generating the Combinations
Combinations <- expand.grid(seq1, seq2)
# Removing the dupilicate Combinations
Combinations <- Combinations[which(Combinations$Var2 < Combinations$Var1),]

# Performing the subtraction
result <- my_df[Combinations$Var1,] - my_df[Combinations$Var2,]

Update based on the comment:

result <- expand.grid(seq1,seq1)%>%
  filter(Var1 > Var2)%>%
  mutate(my_df[Var1,] - my_df[Var2,])

CodePudding user response:

UPDATE: A tidy friendly solution:

library(tidyverse)
set.seed(1)
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))

gives:

# A tibble: 5 x 3
      a      b     c
  <dbl>  <dbl> <dbl>
1 0.266 0.898  0.206
2 0.372 0.945  0.177
3 0.573 0.661  0.687
4 0.908 0.629  0.384
5 0.202 0.0618 0.770

And from there:

my_df %>%
  mutate(ID = row_number()) %>%
  slice(as.numeric(t(combn(1:nrow(.), 2)))) %>%
  mutate(group = rep(1:(n()/2), 2)) %>%
  group_by(group) %>%
  summarize(comparison = paste0(ID[2], "-", ID[1]),
            across(c(a, b, c), ~ .[2] - .[1])) %>%
  select(-group)

which gives:

# A tibble: 10 x 4
   comparison       a       b       c
   <chr>        <dbl>   <dbl>   <dbl>
 1 2-1         0.107   0.0463 -0.0294
 2 3-1         0.307  -0.238   0.481 
 3 4-1         0.643  -0.269   0.178 
 4 5-1        -0.0638 -0.837   0.564 
 5 3-2         0.201  -0.284   0.510 
 6 4-2         0.536  -0.316   0.208 
 7 5-2        -0.170  -0.883   0.593 
 8 4-3         0.335  -0.0317 -0.303 
 9 5-3        -0.371  -0.599   0.0828
10 5-4        -0.707  -0.567   0.386  
  • Related