I have an R data frame as in the following example. I wish to calculate the differences in the column values between observations/ rows (all combinations).
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))
> my_df
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 0.0513 0.267 0.846
2 0.614 0.683 0.937
3 0.230 0.700 0.0651
4 0.671 0.110 0.901
5 0.424 0.520 0.817
I have tried the code below which gives me only the difference between subsequent rows; I want to have all combinations: row2 - row1; row3 - row1; row4 - row1, row5- row1, row3 - row2, row4 - row2, and so on...
Also, the code I wrote does not seem the best to me (!), although it outputs the result I wish, but not for all possible combinations!
my_diff <- as.data.frame(diff(as.matrix(my_df)))
> my_diff
a b c
1 0.5623574 0.41522579 0.09165630
2 -0.3837289 0.01755953 -0.87209740
3 0.4407068 -0.58982681 0.83540813
4 -0.2463205 0.40943495 -0.08358985
I appreciate if someone could provide help in solving my question using R, if possible a using tidy verse options.
Thanks.
CodePudding user response:
Kindly let me know if this is what you were anticipating.
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))
# Generating the sequence to calculate the combinations
seq1 <- seq(1,nrow(my_df))
seq2 <- seq1
# Generating the Combinations
Combinations <- expand.grid(seq1, seq2)
# Removing the dupilicate Combinations
Combinations <- Combinations[which(Combinations$Var2 < Combinations$Var1),]
# Performing the subtraction
result <- my_df[Combinations$Var1,] - my_df[Combinations$Var2,]
Update based on the comment:
result <- expand.grid(seq1,seq1)%>%
filter(Var1 > Var2)%>%
mutate(my_df[Var1,] - my_df[Var2,])
CodePudding user response:
UPDATE: A tidy friendly solution:
library(tidyverse)
set.seed(1)
my_df <- tibble(a=runif(5), b=runif(5), c=runif(5))
gives:
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 0.266 0.898 0.206
2 0.372 0.945 0.177
3 0.573 0.661 0.687
4 0.908 0.629 0.384
5 0.202 0.0618 0.770
And from there:
my_df %>%
mutate(ID = row_number()) %>%
slice(as.numeric(t(combn(1:nrow(.), 2)))) %>%
mutate(group = rep(1:(n()/2), 2)) %>%
group_by(group) %>%
summarize(comparison = paste0(ID[2], "-", ID[1]),
across(c(a, b, c), ~ .[2] - .[1])) %>%
select(-group)
which gives:
# A tibble: 10 x 4
comparison a b c
<chr> <dbl> <dbl> <dbl>
1 2-1 0.107 0.0463 -0.0294
2 3-1 0.307 -0.238 0.481
3 4-1 0.643 -0.269 0.178
4 5-1 -0.0638 -0.837 0.564
5 3-2 0.201 -0.284 0.510
6 4-2 0.536 -0.316 0.208
7 5-2 -0.170 -0.883 0.593
8 4-3 0.335 -0.0317 -0.303
9 5-3 -0.371 -0.599 0.0828
10 5-4 -0.707 -0.567 0.386