Home > Blockchain >  how to do the mean of two dataframes columns to be subtrated "mean(df1$a-df2$b)" in r
how to do the mean of two dataframes columns to be subtrated "mean(df1$a-df2$b)" in r

Time:10-18

My two dataframes looks like this:

> dput(head(df1,25))
structure(list(Date = structure(c(16644, 16645, 16646, 16647, 
16648, 16649, 16650, 16651, 16652, 16653, 16654, 16655, 16656, 
16657, 16658, 16659, 16660, 16661, 16662, 16663, 16664, 16665, 
16666, 16667, 16668), class = "Date"), AU = c(0.241392906920806, 
0.257591745069017, 0.263305712230276, NaN, 0.252892547032525, 
0.251771180928526, 0.249211746794207, 0.257289083109259, 0.205017582640463, 
0.20072274573488, 0.210154167590338, 0.207384553271337, 0.193725450540089, 
0.199282601988984, 0.216267134143314, 0.217052471451736, NaN, 
0.220703029531909, 0.2164619798534, 0.223442036108148, 0.22061326758891, 
NaN, 0.277777461504811, NaN, 0.200839628485262)), row.names = c(NA, 
-25L), class = c("tbl_df", "tbl", "data.frame"))

> dput(head(df2,25))
structure(list(UF1 = c(0.2559, 0.2565, 0.257, 0.2577, 0.2583, 
0.259, 0.2596, 0.2603, 0.2611, 0.2618, 0.2625, 0.2633, 0.2641, 
0.2649, 0.2657, 0.2665, 0.2674, 0.2682, 0.2691, 0.27, 0.2709, 
0.2718, 0.2727, 0.2736, 0.2745), UF2 = c(0.2597, 0.2602, 0.2608, 
0.2614, 0.2621, 0.2627, 0.2634, 0.2641, 0.2648, 0.2655, 0.2663, 
0.267, 0.2678, 0.2686, 0.2694, 0.2702, 0.2711, 0.2719, 0.2728, 
0.2737, 0.2745, 0.2754, 0.2763, 0.2773, 0.2782), UF3 = c(0.2912, 
0.2915, 0.2918, 0.2922, 0.2926, 0.293, 0.2934, 0.2938, 0.2943, 
0.2947, 0.2952, 0.2957, 0.2962, 0.2968, 0.2973, 0.2979, 0.2985, 
0.2991, 0.2997, 0.3003, 0.3009, 0.3016, 0.3022, 0.3029, 0.3035
), Date = structure(c(16644, 16645, 16646, 16647, 16648, 16649, 
16650, 16651, 16652, 16653, 16654, 16655, 16656, 16657, 16658, 
16659, 16660, 16661, 16662, 16663, 16664, 16665, 16666, 16667, 
16668), class = "Date")), row.names = c(NA, 25L), class = "data.frame")
>

I want to do the mean of two different dataframes columns subtracting (mean(df1$AU-df2$UF)). The closest to the solution I got is the following:

data.frame(mean = colMeans(df1$AU, na.rm = TRUE) - colMeans(df2$UF))

but I got this error:

Error in colMeans(df1$mAU, na.rm = TRUE) : 
  'x' must be an array of at least two dimensions

I succeed to run the same code only for dataframes with one column each, but since I have 3 or more columns per dataframe I want calculate against df1$AU I need to be more efficient.

Any help will be much appreciated. Thank you.

CodePudding user response:

Assuming what you meant is that you want the subtraction of the means of the (numeric) columns in df1 with the mean of the (numeric) columns in df2, this can be done like this:

mean(df1$AU, na.rm = T) - colMeans(df2[,1:3], na.rm = T)

this outputs:

       UF1        UF2        UF3 
-0.0367389 -0.0404509 -0.0688949

per column of the df2

I hope this is helpful.

CodePudding user response:

Here are two base R functions to compute the mean of the differences. The 2nd is faster.

meanDiffs1 <- function(x, y, na.rm = TRUE){
  z <- if(na.rm) na.omit(cbind(x, -1*y)) else cbind(x, -1*y)
  mean(rowSums(z))
}
meanDiffs2 <- function(x, y, na.rm = TRUE){
  if(na.rm){
    i <- is.na(x)
    j <- is.na(y)
    mean(x[!i & !j] - y[!i & !j])
  } else {
    mean(x - y)
  }
}

meanDiffs(df1$AU, df2$UF1)
#[1] -0.0361429
meanDiffs2(df1$AU, df2$UF1)
#[1] -0.0361429

To compute all mean differences between df1$AU and df$UF*, use sapply.

sapply(df2[1:3], \(y) meanDiffs2(df1$AU, y))
#        UF1         UF2         UF3 
#-0.03614290 -0.03986195 -0.06848576 
  • Related