I have frequency data on 520 users. I want to calculate the overall mean and sd for each user. Later I want to use the mean and sd to calculate shape and scale for fitting them to a Beta distribution. I have tried a couple of methods. Consider my data look like the following:
Mfrq.df.2=structure(list(X = 1:6, User.ID = c(37593L, 38643L, 49433L, 60403L,
70923L, 85363L), V1 = c(9L, 3L, 4L, 80L, 19L, 0L), V2 = c(10L,
0L, 29L, 113L, 21L, 1L), V3 = c(5L, 2L, 17L, 77L, 7L, 2L), V4 = c(2L,
2L, 16L, 47L, 4L, 3L), V5 = c(2L, 10L, 16L, 40L, 1L, 8L), V6 = c(4L,
0L, 9L, 22L, 1L, 7L), V7 = c(6L, 8L, 9L, 8L, 0L, 6L), V8 = c(2L,
17L, 16L, 24L, 2L, 1L), V9 = c(3L, 20L, 7L, 30L, 0L, 4L), V10 = c(2L,
11L, 5L, 11L, 2L, 3L)), row.names = c(NA, 6L), class = "data.frame")
This was my first attempt for mean & sd:
MidPoint.0=c(5,15,25,35,45,55,65,75,85,95)
record.beta.0= Mfrq.df.2 %>%
rowwise() %>%
mutate(Mean.Freq.0=sum((c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10))*MidPoint.0/sum(c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10)))) %>%
mutate(SD.Freq.0=sqrt(sum(MidPoint.0-Mean.Freq.0)**2*(c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10))/sum(c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10))-1))
This records the mean for me, but I get into the following error:
Error in mutate(., SD.Freq.0 = sqrt(sum(MidPoint.0 - Mean.Freq.0)^2 * :
x `SD.Freq.0` must be size 1, not 10.
ℹ Did you mean: `SD.Freq.0 = list(sqrt(...))` ?
ℹ The error occurred in row 1.
Then I tried this format of data:
structure(list(X = 1:10, User.ID = c(37593L, 37593L, 37593L,
37593L, 37593L, 37593L, 37593L, 37593L, 37593L, 37593L), Value = c(9L,
10L, 5L, 2L, 2L, 4L, 6L, 2L, 3L, 2L), MidPoint = c(5, 15, 25,
35, 45, 55, 65, 75, 85, 95)), row.names = c(NA, 10L), class = "data.frame")
With this code:
record.beta <- Mfrq.df.2_long %>% data.frame %>%
group_by(User.ID) %>%
mutate(Mean.Freq=sum(Value*MidPoint)/sum(Value)) %>%
mutate(SD.Freq=sqrt(sum(MidPoint-Mean.Freq)**2*Value)/sum(Value-1))
But I realized it gives me a distinct SD value for each MidPoint. However, it seems to work properly when I code it for an individual user.
U37593.df=Mfrq.df.2_long[Mfrq.df.2_long$User.ID==37593,]
Mean=sum(U37593.df$MidPoint*U37593.df$Value)/sum(U37593.df$Value)
SD=sqrt(sum((U37593.df$MidPoint - Mean)**2*U37593.df$Value)/(sum(U37593.df$Value) - 1))
Is there any way that I can get ONE SD along with ONE mean for each user (User.ID)?
CodePudding user response:
With dplyr
:
library(dplyr)
Mfrq.df.2 %>%
rowwise() %>%
mutate(mean = mean(c_across(cols = V1:V10))) %>%
mutate(sd = sd(c_across(cols = V1:V10)))
# A tibble: 6 x 14
# Rowwise:
X User.ID V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 mean sd
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl>
1 1 37593 9 10 5 2 2 4 6 2 3 2 4.5 2.99
2 2 38643 3 0 2 2 10 0 8 17 20 11 7.3 7.13
3 3 49433 4 29 17 16 16 9 9 16 7 5 12.8 7.54
4 4 60403 80 113 77 47 40 22 8 24 30 11 45.2 34.4
5 5 70923 19 21 7 4 1 1 0 2 0 2 5.7 7.83
6 6 85363 0 1 2 3 8 7 6 1 4 3 3.5 2.72
CodePudding user response:
I realized there has been a misplacement of my parenthesis. The following is the answer to this question:
MidPoint=c(5,15,25,35,45,55,65,75,85,95)
record.beta = Mfrq.df.2 %>%
rowwise() %>%
mutate(Mean=sum(MidPoint*(c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10)))/sum((c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10)))) %>%
mutate(SD=sqrt(sum((MidPoint - Mean)**2*(c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10)))/(sum((c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10))) - 1)))