I have a data frame that looks like this:
iso_o iso_d value
1 ABW AFG 5
2 AFG ABW 100
3 NDL ALB 17
4 NDL ARE 4758
5 ALB NDL 13
6 ABW ARM 10
I now want to combine all occurrences of pairs in the "iso_o" and "iso_d" columns to and sum up the value they have in "value". So for example, the pair ABW - AFG should only occur one time with the value 105, NDL - ALB also only one time with the value 30. Like this:
iso_o iso_d value
1 ABW AFG 105
2 NDL ALB 30
3 NDL ARE 4758
4 ABW ARM 10
Any ideas on how to do this?
CodePudding user response:
base R
l <- apply(df[1:2], 1, \(x) toString(sort(x)))
agg <- aggregate(value ~ match(l, l),
data = df, FUN = sum)
cbind(df[agg[[1]], 1:2], agg[2])
# iso_o iso_d value
# 1 ABW AFG 105
# 3 NDL ALB 30
# 4 NDL ARE 4758
# 6 ABW ARM 10
dplyr
You can group_by
the rowwise
sort
ed values of iso_o
and iso_d
:
library(dplyr)
df %>%
rowwise() %>%
mutate(sorted = list(sort(c(iso_o, iso_d)))) %>%
group_by(sorted) %>%
summarise(across(iso_o:iso_d, first),
value = sum(value)) %>%
select(-sorted)
# A tibble: 4 × 3
iso_o iso_d value
<chr> <chr> <int>
1 ABW AFG 105
2 NDL ALB 30
3 NDL ARE 4758
4 ABW ARM 10
CodePudding user response:
Here is a tidyverse approach:
df %>% group_by(iso_o, iso_d) %>% mutate(value = sum(value)) %>% ungroup() %>% distinct()