I have a dataframe
like this:
structure(list(from = c("China", "China", "Canada", "Canada",
"USA", "China", "Trinidad and Tobago", "China", "USA", "USA"),
to = c("Japan", "Japan", "USA", "USA", "Japan", "USA", "USA",
"Rep. of Korea", "Canada", "Japan"), weight = c(4766781396,
4039683737, 3419468319, 3216051707, 2535151299, 2513604035,
2303474559, 2096033823, 2091906420, 2066357443)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
from = c("Canada", "China", "China", "China", "Trinidad and Tobago",
"USA", "USA"), to = c("USA", "Japan", "Rep. of Korea", "USA",
"USA", "Canada", "Japan"), .rows = structure(list(3:4, 1:2,
8L, 6L, 7L, 9L, c(5L, 10L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .drop = TRUE))
I would like to perform the absolute value of difference in weight
column grouped by from
and to
.
I'm trying with the function aggregate()
but it seems to work for means and sums and not for difference. For example (df
is the name of my dataframe
):
aggregate(weight~from to, data = df, FUN=mean)
which produces:
from to weight
1 USA Canada 2091906420
2 China Japan 4403232567
3 USA Japan 2300754371
4 China Rep. of Korea 2096033823
5 Canada USA 3317760013
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559
EDIT. The desired result is instead
from to weight
1 USA Canada 2091906420
2 China Japan 727097659
3 USA Japan 468793856
4 China Rep. of Korea 2096033823
5 Canada USA 203416612
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559
As we can see, the countries that appear two times in the columns from
and to
colllapsed in only one row with the difference between weights in the column weight
. E.g.,
from to weight
China Japan 4766781396
China Japan 4039683737
become
from to weight
China Japan 727097659
because
> 4766781396-4039683737
[1] 727097659
The difference should be positive (and this is why I wrote "the absolute value of difference of the weights").
The couples of countries which instead appear just in one row of dataframe df
remain the same, as e.g.
from to weight
7 Trinidad and Tobago USA 2303474559
CodePudding user response:
Assuming at most 2 values per group and that the order of the difference is not important
aggregate(weight~from to, data=df, FUN=function(x){
abs(ifelse(length(x)==1,x,diff(x)))
})
from to weight
1 USA Canada 2091906420
2 China Japan 727097659
3 USA Japan 468793856
4 China Rep. of Korea 2096033823
5 Canada USA 203416612
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559
CodePudding user response:
Is the following what you are looking for?
f <- function(x) abs(x[2] - x[1])
aggregate(weight ~ from to, data = df, FUN = f)
#> from to weight
#> 1 USA Canada NA
#> 2 China Japan 727097659
#> 3 USA Japan 468793856
#> 4 China Rep. of Korea NA
#> 5 Canada USA 203416612
#> 6 China USA NA
#> 7 Trinidad and Tobago USA NA