Home > Blockchain >  Aggregate R (absolute) difference
Aggregate R (absolute) difference

Time:03-10

I have a dataframe like this:

structure(list(from = c("China", "China", "Canada", "Canada", 
"USA", "China", "Trinidad and Tobago", "China", "USA", "USA"), 
    to = c("Japan", "Japan", "USA", "USA", "Japan", "USA", "USA", 
    "Rep. of Korea", "Canada", "Japan"), weight = c(4766781396, 
    4039683737, 3419468319, 3216051707, 2535151299, 2513604035, 
    2303474559, 2096033823, 2091906420, 2066357443)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
    from = c("Canada", "China", "China", "China", "Trinidad and Tobago", 
    "USA", "USA"), to = c("USA", "Japan", "Rep. of Korea", "USA", 
    "USA", "Canada", "Japan"), .rows = structure(list(3:4, 1:2, 
        8L, 6L, 7L, 9L, c(5L, 10L)), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .drop = TRUE))

I would like to perform the absolute value of difference in weight column grouped by from and to.

I'm trying with the function aggregate() but it seems to work for means and sums and not for difference. For example (df is the name of my dataframe):

aggregate(weight~from to, data = df, FUN=mean)

which produces:

                 from            to     weight
1                 USA        Canada 2091906420
2               China         Japan 4403232567
3                 USA         Japan 2300754371
4               China Rep. of Korea 2096033823
5              Canada           USA 3317760013
6               China           USA 2513604035
7 Trinidad and Tobago           USA 2303474559

EDIT. The desired result is instead

                 from            to     weight
1                 USA        Canada 2091906420
2               China         Japan 727097659
3                 USA         Japan 468793856
4               China Rep. of Korea 2096033823
5              Canada           USA 203416612
6               China           USA 2513604035
7 Trinidad and Tobago           USA 2303474559

As we can see, the countries that appear two times in the columns from and to colllapsed in only one row with the difference between weights in the column weight. E.g.,

from            to            weight
China           Japan         4766781396
China           Japan         4039683737

become

from            to            weight
China           Japan         727097659

because

> 4766781396-4039683737
[1] 727097659

The difference should be positive (and this is why I wrote "the absolute value of difference of the weights").

The couples of countries which instead appear just in one row of dataframe df remain the same, as e.g.

                 from            to     weight
7 Trinidad and Tobago           USA 2303474559

CodePudding user response:

Assuming at most 2 values per group and that the order of the difference is not important

aggregate(weight~from to, data=df, FUN=function(x){
  abs(ifelse(length(x)==1,x,diff(x)))
})

                 from            to     weight
1                 USA        Canada 2091906420
2               China         Japan  727097659
3                 USA         Japan  468793856
4               China Rep. of Korea 2096033823
5              Canada           USA  203416612
6               China           USA 2513604035
7 Trinidad and Tobago           USA 2303474559

CodePudding user response:

Is the following what you are looking for?

f <- function(x) abs(x[2] - x[1])
aggregate(weight ~ from   to, data = df, FUN = f)

#>                  from            to    weight
#> 1                 USA        Canada        NA
#> 2               China         Japan 727097659
#> 3                 USA         Japan 468793856
#> 4               China Rep. of Korea        NA
#> 5              Canada           USA 203416612
#> 6               China           USA        NA
#> 7 Trinidad and Tobago           USA        NA
  • Related