Home > Mobile >  Z score calculation
Z score calculation

Time:01-17

I'm having some issues when trying to calculate the z scores of my dataset (My dataset is called comb22)

Here is my code

z_scores <- as.data.frame(sapply(comb22, function(comb22) (abs(comb22-mean(comb22))/sd(comb22))))

Here is the code I am filling in

 z_scores <- as.data.frame(sapply(df, function(df) (abs(df-mean(df))/sd(df))))

When I run this code, I get the error "Error in comb22 - mean(comb22) : non-numeric argument to binary operator"

I apologize if this is obvious, but where is the error in my code? I've also tried calculating the z scores of only one of my columns by using comb$Humidity ( instead of just comb22).

CodePudding user response:

The error message "non-numeric argument to binary operator" usually indicates that one or more of the elements in your data frame are non-numeric, and the sapply() function is unable to perform the calculations.

One way to fix this issue is to check the data types of the columns in your data frame and ensure that they are all numeric. You can use the class() function to check the data type of each column:

#Running the following will give the error. As 3 in column A quoted and #considered as type character.
 
# Create a sample data frame with a non-numeric value in column A
df <- data.frame(A = c(1, 2, '3', 4), B = c(5, 6, 7, 8), C = c(9, 10, 11, 12))

sapply(df,class)

#          A           B           C 
#"character"   "numeric"   "numeric"

# Use sapply to calculate the z-scores for each column
z_scores <- sapply(df, function(x) (x - mean(x, na.rm = TRUE))/sd(x, na.rm = TRUE))

#Error in x - mean(x) : non-numeric argument to binary operator
#In addition: Warning message:
#In mean.default(x) : argument is not numeric or logical: returning NA

you can convert this to numeric using the as.numeric()

df[ , c("A","B","C")] <- sapply(df[ , c("A","B","C")], as.numeric)

sapply(df,class)

#       A         B         C 
#"numeric" "numeric" "numeric" 


# Use sapply to calculate the z-scores for each column
z_scores <- sapply(df, function(x) (x - mean(x, na.rm = TRUE))/sd(x, na.rm = TRUE))

# Print the result
print(z_scores)

#              A          B          C
#[1,] -1.1618950 -1.1618950 -1.1618950
#[2,] -0.3872983 -0.3872983 -0.3872983
#[3,]  0.3872983  0.3872983  0.3872983
#[4,]  1.1618950  1.1618950  1.1618950


  • Related