Compute standard deviation with a manually set mean in R-CodePudding

I know how to compute the sd using summarize:

ans <- temp%>% group_by(permno)%>%  summarise(std = sd(ret)))

But how do I compute the standard deviation given I know the mean = 0?

In other words, I know the true mean and want to use that instead of using the sample mean while computing the sd.

One way would be to manually code the sd function, but I need it to work for each group, so I'm stuck.

CodePudding user response：

It is always best to provide reproducible data. Here is an example with the iris data set:

data(iris)
GM <- mean(iris$Sepal.Length)  # "Population mean"
ans <- iris %>% group_by(Species) %>% summarise(std=sum((Sepal.Length - GM)^2)/length(Sepal.Length))
ans
# A tibble: 3 × 2
#   Species      std
#   <fct>      <dbl>
# 1 setosa     0.823
# 2 versicolor 0.270
# 3 virginica  0.951

As compared with computing the sd with each group mean:

ans <- iris %>% group_by(Species) %>% summarise(std=sd((Sepal.Length)))
ans
# A tibble: 3 × 2
#   Species      std
#   <fct>      <dbl>
# 1 setosa     0.352
# 2 versicolor 0.516
# 3 virginica  0.636

Note that sd uses 'n - 1' in the denominator, but since you indicated that your mean was a population mean we use n.

CodePudding user response：

I came up with this solution:

sd_fn <- function(x, mean_pop) {
  sd_f <- sqrt((sum((x-mean_pop)^2))/(length(x)))
  sd_f
}

x <- c(1,2,3,-1,-1.5,-2.8)
mean_pop <- 0

sd_fn(x, mean_pop)

I simply created a function where the arguments are a numeric vector and the population mean that you already know... Simply enter the vector with data and mean population and the function will givr you thr desired standard deviation.