Home > Blockchain >  R: how to combine the value of a column in two rows together, if these two rows share same character
R: how to combine the value of a column in two rows together, if these two rows share same character

Time:06-10

I am constructing an edge list for a network. I would like to combine the value of the third column together if the first two columns are the same. The data I have is like this.

ego    alter   weight
A      B       12
B      A       10
C      D       5
D      C       2
E      F       7
F      E       6

The dataset I expect is like this:

ego    alter   weight
A      B       22
C      D       7
E      F       13

Please enlighten me if you have some great ideas to achieve the expected result. Thank you so much.

CodePudding user response:

You could do the following:

f <- function(e,a) sapply(seq_along(e), \(i) paste0(sort(c(e[i],a[i])), collapse=""))

group_by(dt, grp = f(ego,alter)) %>% 
  summarize(weight=sum(weight),.groups="drop") %>% 
  separate(grp,c("ego","alter"),1)

Output:

  ego   alter weight
  <chr> <chr>  <int>
1 A     B         22
2 C     D          7
3 E     F         13

CodePudding user response:

A possible solution:

library(tidyverse)

df %>% 
  rowwise() %>% 
  mutate(aux = sort(c(ego, alter)) %>% str_c(collapse = "")) %>% 
  group_by(aux) %>% 
  summarise(ego, alter, weight = sum(weight), .groups = "drop") %>% 
  filter(!duplicated(aux)) %>% 
  select(-aux)

#> # A tibble: 3 × 3
#>   ego   alter weight
#>   <chr> <chr>  <int>
#> 1 A     B         22
#> 2 C     D          7
#> 3 E     F         13

Or avoiding rowwise:

library(tidyverse)

df %>% 
  mutate(aux = apply(df[1:2], 1, \(x) sort(x) %>% paste0(collapse = ""))) %>% 
  group_by(aux) %>% 
  summarise(ego, alter, weight = sum(weight), .groups = "drop") %>% 
  filter(!duplicated(aux)) %>% 
  select(-aux)

#> # A tibble: 3 × 3
#>   ego   alter weight
#>   <chr> <chr>  <int>
#> 1 A     B         22
#> 2 C     D          7
#> 3 E     F         13

And yet another solution, a bit more succinct:

library(tidyverse)

df %>% 
  group_by(aux = map2_chr(ego, alter, ~ sort(c(.x, .y)) %>% str_c(collapse = ""))) %>% 
  summarise(weight = sum(weight)) %>% 
  extract(aux, c("ego", "alter"), "([[:upper:]])([[:upper:]])")

#> # A tibble: 3 × 3
#>   ego   alter weight
#>   <chr> <chr>  <int>
#> 1 A     B         22
#> 2 C     D          7
#> 3 E     F         13
  • Related