I am constructing an edge list for a network. I would like to combine the value of the third column together if the first two columns are the same. The data I have is like this.
ego alter weight
A B 12
B A 10
C D 5
D C 2
E F 7
F E 6
The dataset I expect is like this:
ego alter weight
A B 22
C D 7
E F 13
Please enlighten me if you have some great ideas to achieve the expected result. Thank you so much.
CodePudding user response:
You could do the following:
f <- function(e,a) sapply(seq_along(e), \(i) paste0(sort(c(e[i],a[i])), collapse=""))
group_by(dt, grp = f(ego,alter)) %>%
summarize(weight=sum(weight),.groups="drop") %>%
separate(grp,c("ego","alter"),1)
Output:
ego alter weight
<chr> <chr> <int>
1 A B 22
2 C D 7
3 E F 13
CodePudding user response:
A possible solution:
library(tidyverse)
df %>%
rowwise() %>%
mutate(aux = sort(c(ego, alter)) %>% str_c(collapse = "")) %>%
group_by(aux) %>%
summarise(ego, alter, weight = sum(weight), .groups = "drop") %>%
filter(!duplicated(aux)) %>%
select(-aux)
#> # A tibble: 3 × 3
#> ego alter weight
#> <chr> <chr> <int>
#> 1 A B 22
#> 2 C D 7
#> 3 E F 13
Or avoiding rowwise
:
library(tidyverse)
df %>%
mutate(aux = apply(df[1:2], 1, \(x) sort(x) %>% paste0(collapse = ""))) %>%
group_by(aux) %>%
summarise(ego, alter, weight = sum(weight), .groups = "drop") %>%
filter(!duplicated(aux)) %>%
select(-aux)
#> # A tibble: 3 × 3
#> ego alter weight
#> <chr> <chr> <int>
#> 1 A B 22
#> 2 C D 7
#> 3 E F 13
And yet another solution, a bit more succinct:
library(tidyverse)
df %>%
group_by(aux = map2_chr(ego, alter, ~ sort(c(.x, .y)) %>% str_c(collapse = ""))) %>%
summarise(weight = sum(weight)) %>%
extract(aux, c("ego", "alter"), "([[:upper:]])([[:upper:]])")
#> # A tibble: 3 × 3
#> ego alter weight
#> <chr> <chr> <int>
#> 1 A B 22
#> 2 C D 7
#> 3 E F 13