I know how to compute the sd using summarize:
ans <- temp%>% group_by(permno)%>% summarise(std = sd(ret)))
But how do I compute the standard deviation given I know the mean = 0?
In other words, I know the true mean and want to use that instead of using the sample mean while computing the sd.
One way would be to manually code the sd function, but I need it to work for each group, so I'm stuck.
CodePudding user response:
It is always best to provide reproducible data. Here is an example with the iris
data set:
data(iris)
GM <- mean(iris$Sepal.Length) # "Population mean"
ans <- iris %>% group_by(Species) %>% summarise(std=sum((Sepal.Length - GM)^2)/length(Sepal.Length))
ans
# A tibble: 3 × 2
# Species std
# <fct> <dbl>
# 1 setosa 0.823
# 2 versicolor 0.270
# 3 virginica 0.951
As compared with computing the sd with each group mean:
ans <- iris %>% group_by(Species) %>% summarise(std=sd((Sepal.Length)))
ans
# A tibble: 3 × 2
# Species std
# <fct> <dbl>
# 1 setosa 0.352
# 2 versicolor 0.516
# 3 virginica 0.636
Note that sd
uses 'n - 1' in the denominator, but since you indicated that your mean was a population mean we use n
.
CodePudding user response:
I came up with this solution:
sd_fn <- function(x, mean_pop) {
sd_f <- sqrt((sum((x-mean_pop)^2))/(length(x)))
sd_f
}
x <- c(1,2,3,-1,-1.5,-2.8)
mean_pop <- 0
sd_fn(x, mean_pop)
I simply created a function where the arguments are a numeric vector and the population mean that you already know... Simply enter the vector with data and mean population and the function will givr you thr desired standard deviation.