I'm trying to calculate p-values from Z values using the pnorm function.
I have a dataset. The dataset contains a Score per day (IATperDay), a day (here the first 50 days of one year), the mean score across days and the sd of the IAT scores across days. I created this dataset using this code:
dfP <- df %>%
dplyr::group_by(Time) %>%
summarize(IATperDay = mean(IAT_Score, na.rm = TRUE)) %>%
mutate(meanIAT = mean(IATperDay)) %>%
mutate(sdIAT = sd(IATperDay)) %>%
mutate(zscore = (IATperDay - meanIAT) / sdIAT)
Now, I have one z score per day. I want to calculate a p-value for each day (actually just for particular days, but that should not matter). I extract the z value for a day from the dataset, e.g., from day 39. The z score is .28. It is > 0, therefore we use lower.tail = FALSE Now, the two-tailed p-value I get from this is > 1, which is impossible. What am I doing wrong? Dataset is at the end of the post. The mean IAT = .30 and the SD = .03. For larger z values (e.g. 2), I get a possible p-value with lower.tail = FALSE, and an impossible one for lower.tail = TRUE. Hence, that should not be the cause (and logically it must be lower.tail = FALSE, because the z is bigger 0). Z values across my full dataset have mean 0 and SD 1, so that is also correct.
zscore39th <- dfP$zscore[dfP$Time == 39]
pvalue39th <- 2*pnorm(zscore39th, dfP$meanIAT, dfP$sdIAT, lower.tail = FALSE)
Dataset:
dfP <- structure(list(Time = 1:50, IATperDay = c(0.410879358218113,
0.311747574422159, 0.347005504763568, 0.357432731378968, 0.327206457506122,
0.325823694431151, 0.322558512273962, 0.353998763179494, 0.345998499422222,
0.343135975926421, 0.316900599513761, 0.35029336278125, 0.300125358669987,
0.345985251370012, 0.324138921118391, 0.34967331753911, 0.361544753274725,
0.35559678543985, 0.329797640858657, 0.347581891776735, 0.336918439625758,
0.332354261022726, 0.363049139318431, 0.373664851219269, 0.333226454434049,
0.343660654233193, 0.342454356593047, 0.337302292928455, 0.31144738305403,
0.345075755538591, 0.313879449248438, 0.341799525573947, 0.331983508130602,
0.334468280052049, 0.357900471182715, 0.343641028346217, 0.345370026382199,
0.312915322731732, 0.317338241827508, 0.338024235581454, 0.327430851742063,
0.346969868897998, 0.306530308207684, 0.321491212473958, 0.36527845791548,
0.340268008588983, 0.34432471059434, 0.328157620515314, 0.304279396778481,
0.32783869517467), meanIAT = c(0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957,
0.307794000883957, 0.307794000883957, 0.307794000883957, 0.307794000883957
), sdIAT = c(0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038, 0.0339300122669038,
0.0339300122669038, 0.0339300122669038, 0.0339300122669038),
zscore = c(3.03817624713059, 0.116521429674178, 1.15565840563692,
1.46297413937072, 0.572132319595447, 0.531378928052463, 0.435146066964642,
1.36176674302612, 1.12597950857599, 1.04161397775085, 0.268393614425175,
1.25255957949501, -0.226013540863053, 1.12558905624999, 0.481724560128664,
1.23428533788074, 1.58416542758514, 1.40886434640406, 0.648500796333739,
1.17264593303991, 0.858368058127986, 0.723850611829175, 1.62850334387799,
1.94137419748488, 0.749556273367455, 1.05707752378922, 1.02152499788216,
0.869681148723951, 0.107674059806836, 1.09878400164917, 0.17935296682509,
1.00222553480064, 0.712923622201011, 0.786155895207585, 1.47675956921576,
1.05649910115789, 1.10745687925657, 0.150937813033768, 0.281291998024449,
0.890958554912883, 0.578745763592322, 1.15460812999041, -0.0372440972414542,
0.403690145534145, 1.69420678599623, 0.957088003669907, 1.07664888014249,
0.60016540728519, -0.103583932650205, 0.590765901675347)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
You are actually doing the standardization twice, once when you compute the z-score and then again by specifying the mean and standard deviation in pnorm()
. So just remove the two parameters from pnorm()
and use lower.tail=T
when the z-score is negative and lower.tail=F
otherwise:
2*mapply(pnorm, dfP$zscore, lower.tail=dfP$zscore<0)
# [1] 0.002380147 0.907239303 0.247820908 0.143474455 0.567232335 0.595156221
# [7] 0.663456407 0.173271510 0.260174193 0.297590686 0.788396354 0.210366033
# [13] 0.821190889 0.260339502 0.630001627 0.217096598 0.113156079 0.158875284
# [19] 0.516661096 0.240937821 0.390689260 0.469157453 0.103418201 0.052212906
# [25] 0.453521995 0.290476208 0.307005772 0.384474677 0.914254247 0.271862292
# [31] 0.857660560 0.316234678 0.475893021 0.431776161 0.139740098 0.290740251
# [37] 0.268096446 0.880024772 0.778486447 0.372951405 0.562760732 0.248250936
# [43] 0.970290378 0.686440587 0.090225996 0.338522838 0.281637158 0.548396005
# [49] 0.917499539 0.554677285
I used mapply()
here because the lower.tail
argument is not recycled by pnorm()
.