I want to get the standard deviation of specific columns in a dataframe and store those means in a list in R.
The specific variable names of the columns are stored in a vector. For those specific variables (depends on user input) I want to calculate the standard deviation and store those in a list, over which I can loop then to use it in another part of my code.
I tried as follows, e.g.:
specific_variables <- c("variable1", "variable2") # can be of a different length depending on user input
data <- data.frame(...) # this is a dataframe with multiple columns, of which "variable1" and "variable2" are both columns from
sd_list <- 0 # empty variable for storage purposes
# for loop over the variables
for (i in length(specific_variables)) {
sd_list[i] <- sd(data$specific_variables[i], na.rm = TRUE)
}
print(sd_list)
I get an error.
Second attempt using colSds and sapply:
colSds(data[sapply(specific_variables, na.rm = TRUE)])
But the colSds function doesn't work (anymore?).
Ideally, I'd like to store those the standard deviations from certain column names into a list.
CodePudding user response:
Lets assume you have a dataframe with two columns. The easiest way is to use apply
:
frame<-data.frame(X=1:6,Y=rnorm(6))
sd_list<-apply(frame,2,sd)
the "2" in apply
means: calculate sds for each column. A "1" would mean: calculate for each row.
There is no colSds()
function, but colMeans()
and colSums()
do exist ...
CodePudding user response:
With help of @shghm I found a way:
sd_list <- as.list(unname(apply(data[specific_variables], 2, sd, na.rm = TRUE)))