Home > Back-end >  How to get standard deviation of multiple columns in R?
How to get standard deviation of multiple columns in R?

Time:10-27

I want to get the standard deviation of specific columns in a dataframe and store those means in a list in R.

The specific variable names of the columns are stored in a vector. For those specific variables (depends on user input) I want to calculate the standard deviation and store those in a list, over which I can loop then to use it in another part of my code.

I tried as follows, e.g.:

specific_variables <- c("variable1", "variable2")  # can be of a different length depending on user input
data <- data.frame(...)  # this is a dataframe with multiple columns, of which "variable1" and "variable2" are both columns from
sd_list <- 0  # empty variable for storage purposes

# for loop over the variables
for (i in length(specific_variables)) {
  sd_list[i] <- sd(data$specific_variables[i], na.rm = TRUE)
}

print(sd_list)

I get an error.

Second attempt using colSds and sapply:

colSds(data[sapply(specific_variables, na.rm = TRUE)])

But the colSds function doesn't work (anymore?).

Ideally, I'd like to store those the standard deviations from certain column names into a list.

CodePudding user response:

Lets assume you have a dataframe with two columns. The easiest way is to use apply:

frame<-data.frame(X=1:6,Y=rnorm(6))
sd_list<-apply(frame,2,sd)  

the "2" in apply means: calculate sds for each column. A "1" would mean: calculate for each row.

There is no colSds() function, but colMeans() and colSums() do exist ...

CodePudding user response:

With help of @shghm I found a way:

sd_list <- as.list(unname(apply(data[specific_variables], 2, sd, na.rm = TRUE)))

  • Related