Turning a formula to a function-CodePudding

I have the following formula of relative estimation error that I am trying to turn into a function to measure the precision of measurements.

Where:

Y_estimated = the observed values in the test df

y_true = the true value that we are trying to estimate (denoted as p_true in code)

R = the number of observations in each iteration (n=3)

My data has the following format:

# dataframe
test<- iris[1:3,1:4]

# make a vector that shows the true population value for each column in the dataframe
p_true<- c(5, 3, 1, 0.3)

# function
estimate = function(df, y_true) {
  
  ((sqrt(sum((df - y_true) ^ 2)) / 3) / y_true) * 100 
}                                                           

y_true <- p_true


final2 <- test %>%
  group_modify( ~ as.data.frame(estimate(., p_true)))

The goal is to have an output of 1 row of 4 precision estimates (one for each variable). Apart from the current output format being put into 1 column and 4 rows instead of 4 columns and 1 row, I'm not sure if the function is set up correctly as the values are much more extreme than I am expecting to get.

If anyone can confirm if my function is set up correctly and/or how to get the output to be in the correct format I would really appreciate the help.

CodePudding user response：

No. It does not do what you think it is doing. Look at the result of test - y_true. The values are not what you expect because you are subtracting a vector (y_true) from a matrix (test). R applies the vector across rows, therefore 5.1 - 5 = 0.1, 4.9 - 3 = 1.9, 4.7 - 1 = 3.7:

test - y_true
#   Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1          0.1         3.2          0.4        -2.8
# 2          1.9        -2.0          1.1        -0.8
# 3          3.7         0.2         -3.7        -0.1

There are several ways to handle this, here are 3:

test - matrix(y_true, 3, 4, byrow=TRUE)
t(t(test) - y_true)
sweep(test, 2, y_true, "-")
# These will produce what you are expecting:
#   Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1          0.1         0.5          0.4        -0.1
# 2         -0.1         0.0          0.4        -0.1
# 3         -0.3         0.2          0.3        -0.1

You will also need to use colSums() instead of sum(). Then the rest of your code should work:

sqrt(colSums((test - matrix(y_true, 3, 4, byrow=TRUE))^2) / 3) / y_true * 100
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#     3.829708    10.363755    36.968455    33.333333