I face the following problem. I have a dataset that I simplify this way
df <- tibble(ID_1=c("a","a", "a","a", "a", "b", "b", "b", "b"), ID_2 = c("a","b", "c","d", "e", "b", "a", "c", "d"),
other= c(1, "bu", 34, 3, 5, 6, "jud", 82, 9), exchange=c(1, 13, 34, 3, 5, 6, 8, 82, 9),
goods=c("wood","wood", "wood","wood", "wood", "steel", "steel", "steel", "steel"))
df
We have exchanges between multiple countries that have different IDs. I want to sum all the exchanges between country A and country B. Between country A and country C. etc. the goods exchanged. That is, in the end, I'd like to have
ID_1 // ID_2 // Sum_exchangeAB
ID_1 // ID_3 // Sum_exchangeBC
....
ID_n // ID_n // Sumexchangenn
How can I do that in Dplyr? I tried group_by (ID_1, ID_2) then aggregate. Or summarise.
I tried for example the following: dataset2 %>% group_by(ID_1, ID_2, exchange) %>% summarise(dataset2, sum_of_exchange=sum(exchange)) . But this gives me the sum of all the exchange (that is, it returns one value). Not the sum of exchange between two countries.
Thank you for your help, much appreciated!!
CodePudding user response:
You were close:
library(dplyr)
df %>% group_by(ID_1, ID_2) %>% summarise(Sum_Exchange = sum(exchange))
This yields:
# A tibble: 9 x 3
# Groups: ID_1 [2]
ID_1 ID_2 Sum_Exchange
<chr> <chr> <dbl>
1 a a 1
2 a b 13
3 a c 34
4 a d 3
5 a e 5
6 b a 8
7 b b 6
8 b c 82
9 b d 9
You can then remove identical IDs to omit AA, BB.
df %>%
group_by(ID_1, ID_2) %>%
summarise(Sum_Exchange = sum(exchange)) %>%
filter(ID_1 != ID_2)
add-on from comment
Combine dataframes with {dplyr}
's bind_rows(df1, df2)
.
Mind you that bind_rows()
will generate new columns for variables/columns that do not exist in the original data frame and fill cells of existing columns with no value with NA
.
You can force, certain values into the same column/variable by renaming such a variable.
For your problem at hand. Check what you want/need. The original data frame comes with other, goods
variables. Thus, these are not existing in the summarised data frame. Doing a bind_rows()
will populate these with NA. To get the Sum_Exchange
into the exchange
variable, rename that column.
CodePudding user response:
Is that what you are looking for?
df %>%
rowwise() %>%
mutate(new_id = paste0(str_sort(c(ID_1, ID_2)), collapse = '-')) %>%
group_by(new_id) %>%
summarize(exchange = sum(exchange))
which gives:
# A tibble: 8 x 2
new_id exchange
<chr> <dbl>
1 a-a 1
2 a-b 21
3 a-c 34
4 a-d 3
5 a-e 5
6 b-b 6
7 b-c 82
8 b-d 9
UPDATE: it seems the TO wants to append the sum to the original data which can be done with:
df %>%
rowwise() %>%
mutate(new_id = paste0(str_sort(c(ID_1, ID_2)), collapse = '-')) %>%
group_by(new_id) %>%
mutate(sum_exchange = sum(exchange)) %>%
ungroup()
which gives:
# A tibble: 9 x 7
ID_1 ID_2 other exchange goods new_id sum_exchange
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 a a 1 1 wood a-a 1
2 a b bu 13 wood a-b 21
3 a c 34 34 wood a-c 34
4 a d 3 3 wood a-d 3
5 a e 5 5 wood a-e 5
6 b b 6 6 steel b-b 6
7 b a jud 8 steel a-b 21
8 b c 82 82 steel b-c 82
9 b d 9 9 steel b-d 9