I have a data range of 10,000 points as per:
data = rbinom(10000, size=10, prob=1/4)
I need to find the mean and standard deviation of the data values >=5
.
There are approx 766 values as per:
sum(data >=5)
sum
(or any other approach I can think of) produces a TRUE/FALSE
and cannot be used within a mean
or sd
calculation. How do I divide up the actual values?!
CodePudding user response:
If you want to get all the values of data
which are greater than or equal to 5, rather than just a logical vector telling you if the values of data
are greater than or equal to 5, you need to do data[data >= 5]
.
So we can do:
data = rbinom(10000, size=10, prob=1/4)
mean(data[data >= 5])
#> [1] 5.298153
sd(data[data >= 5])
#> [1] 0.5567141
CodePudding user response:
Maybe try this:
library(dplyr)
data %>%
as.data.frame() %>%
filter(. >= 5) %>%
summarise(mean = mean(.),
sd = sd(.))
Output:
mean sd
1 5.297092 0.5815554
Data
data = rbinom(10000, size=10, prob=1/4)
CodePudding user response:
The TRUE
and FALSE
values can be used in mean()
, sum()
, sd()
, etc... as they have numerical values 0 and 1, respectively.
set.seed(456)
data = rbinom(10000, size=10, prob=1/4)
mean(data >= 5)
#> [1] 0.0779
sum(data >= 5)
#> [1] 779
sd(data >= 5)
#> [1] 0.2680276
Created on 2022-05-14 by the reprex package (v2.0.1)