Mean and Standard Deviation of x>=5 of 10000 data points binomial(10, 1/4)-CodePudding

I have a data range of 10,000 points as per:

data = rbinom(10000, size=10, prob=1/4)

I need to find the mean and standard deviation of the data values >=5.

There are approx 766 values as per:

sum(data >=5)

sum (or any other approach I can think of) produces a TRUE/FALSE and cannot be used within a mean or sd calculation. How do I divide up the actual values?!

CodePudding user response：

If you want to get all the values of data which are greater than or equal to 5, rather than just a logical vector telling you if the values of data are greater than or equal to 5, you need to do data[data >= 5].

So we can do:

data = rbinom(10000, size=10, prob=1/4)

mean(data[data >= 5])
#> [1] 5.298153

sd(data[data >= 5])
#> [1] 0.5567141

CodePudding user response：

Maybe try this:

library(dplyr)
data %>%
  as.data.frame() %>%
  filter(. >= 5) %>%
  summarise(mean = mean(.),
            sd = sd(.))

Output:

      mean        sd
1 5.297092 0.5815554

Data

data = rbinom(10000, size=10, prob=1/4)

CodePudding user response：

The TRUE and FALSE values can be used in mean(), sum(), sd(), etc... as they have numerical values 0 and 1, respectively.

set.seed(456)
data = rbinom(10000, size=10, prob=1/4)
mean(data >= 5)
#> [1] 0.0779
sum(data >= 5)
#> [1] 779
sd(data >= 5)
#> [1] 0.2680276

^{Created on 2022-05-14 by the reprex package (v2.0.1)}