Home > Net >  Compute the difference between two columns by pair in R
Compute the difference between two columns by pair in R

Time:06-07

I have the following data:

names <- c("a", "b", "c", "d")
scores <- c(95, 55, 100, 60)
df <- cbind.data.frame(names, scores)

I want to "extend" this data frame to make name pairs for every possible combination of names without repetition like so:

names_1 <- c("a", "a", "a", "b", "b", "c")
names_2 <- c("b", "c", "d", "c", "d", "d")
scores_1 <- c(95, 95, 95, 55, 55, 100)
scores_2 <- c(55, 100, 60, 100, 60, 60)

df_extended <- cbind.data.frame(names_1, names_2, scores_1, scores_2)

In the extended data, scores_1 are the scores for the corresponding name in names_1, and scores_2 are for names_2.

The following bit of code makes the appropriate name pairs. But I do not know how to get the scores in the right place after that.

t(combn(df$names,2))

The final goal is to get the row-wise difference between scores_1 and scores_2.

df_extended$score_diff <- abs(df_extended$scores_1 - df_extended$scores_2)

CodePudding user response:

First, we can create a new data frame with the unique combinations of names. Then, we can merge on the scores to match the names for both names_1 and names_2 to get the final data.frame.

names <- c("a", "b", "c", "d")
scores <- c(95, 55, 100, 60)
df <- cbind.data.frame(names, scores)

new_df <- data.frame(t(combn(df$names,2)))
names(new_df)[1] <- "names_1"; names(new_df)[2] <- "names_2"

new_df <- merge(new_df, df, by.x = 'names_1', by.y = 'names')
new_df <- merge(new_df, df, by.x = 'names_2', by.y = 'names')

names(new_df)[3] <- "scores_1"; names(new_df)[4] <- "scores_2"

> new_df
  names_2 names_1 scores_1 scores_2
1       b       a       95       55
2       c       a       95      100
3       c       b       55      100
4       d       a       95       60
5       d       b       55       60
6       d       c      100       60

CodePudding user response:

df_ext <- data.frame(t(combn(df$names, 2,\(x)c(x, df$scores[df$names %in%x]))))
df_ext <- setNames(type.convert(df_ext, as.is =TRUE), c('name_1','name_2', 'type_1', 'type_2'))

df_ext
  name_1 name_2 type_1 type_2
1      a      b     95     55
2      a      c     95    100
3      a      d     95     60
4      b      c     55    100
5      b      d     55     60
6      c      d    100     60

CodePudding user response:

names <- c("a", "b", "c", "d")
scores <- c(95, 55, 100, 60)
df <- cbind.data.frame(names, scores)

library(tidyverse)
map(df, ~combn(x = .x, m = 2)%>% t %>% as_tibble) %>% 
  imap_dfc(~set_names(x = .x, nm = paste(.y, seq(ncol(.x)), sep = "_"))) %>% 
  mutate(score_diff = scores_1 - scores_2)

#> # A tibble: 6 × 5
#>   names_1 names_2 scores_1 scores_2 score_diff
#>   <chr>   <chr>      <dbl>    <dbl>      <dbl>
#> 1 a       b             95       55         40
#> 2 a       c             95      100         -5
#> 3 a       d             95       60         35
#> 4 b       c             55      100        -45
#> 5 b       d             55       60         -5
#> 6 c       d            100       60         40

Created on 2022-06-06 by the reprex package (v2.0.1)

  • Related