I need to calculate an FDR variable per group, using an expected random distribution of p values (corresponds to the "Random" type).
library(dplyr)
library(data.table)
calculate_empirical_fdr = function(control_pVal, test_pVal) {
m_control = length(control_pVal)
m_test = length(test_pVal)
unlist(lapply(test_pVal, function(significance_threshold) {
m_control = length(control_pVal)
m_test = length(test_pVal)
FP_expected = length(control_pVal[control_pVal<=significance_threshold])*m_test/m_control # number of
expected false positives in a p-value sequence with the size m_test
S = length(test_pVal[test_pVal<=significance_threshold]) # number of significant hits (FP TP)
return(FP_expected/S)
}))
}
An example dataset with groups I need to control for in the "Group" variable:
set.seed(42)
library(dplyr); library(data.table)
dataset_test = data.table(Type = c(rep("Random", 500),
rep("test1", 500),
rep("test2", 500)),
Group = sample(c("group1", "group2", "group3"), 1500, replace = T),
Pvalue = c(runif(n = 500),
rbeta(n = 500, shape1 = 1, shape2 = 4),
rbeta(n = 500, shape1 = 1, shape2 = 6))
)
I have found that the best way to use my function per group would be using a temporal variable where I can store the p values of the random type, but this does not work:
dataset_test %>%
group_by(Group) %>%
{filter(Type=="Random") %>% select(Pvalue) ->> control_set } %>%
group_by(Type, add = T) %>%
mutate(FDR_empirical = calculate_empirical_fdr(control_pVal = control_set,
test_pVal = Pvalue)) %>%
data.table()
Error in filter(Type == "Random") : object 'Type' not found
I understand that probably temporal vairables "do not see" the environment within the data.table, would be glad to hear any suggestions how to fix it.
CodePudding user response:
You can do something like this, which filters the control group P-values using the data.table special .BY
setDT(dataset_test)
dataset_test[
i= Type!="Random",
j = FDR_empirical:=calculate_empirical_fdr(dataset_test[Type=="Random" & Group ==.BY$Group, Pvalue], Pvalue),
by = .(Group, Type)
]
Output:
Type Group Pvalue FDR_empirical
1: Random group1 0.70292111 NA
2: Random group1 0.72383117 NA
3: Random group1 0.76413459 NA
4: Random group1 0.87942702 NA
5: Random group2 0.71229213 NA
---
1496: test2 group3 0.34817178 0.3681791
1497: test2 group1 0.22419118 0.2308988
1498: test2 group3 0.07258545 0.2314655
1499: test2 group2 0.24687976 0.2849462
1500: test2 group1 0.12206777 0.1760657
CodePudding user response:
Two possible solutions
Use the dot .
dataset_test %>%
group_by(Group) %>%
{filter(., Type=="Random") %>% select(Pvalue) ->> control_set; . } %>%
group_by(Type, add = T)
Use the tee-pipe %T>%
from the magrittr
package
library(magrittr)
dataset_test %>%
group_by(Group) %T>% {
filter(., Type=="Random") %>% select(Pvalue) ->> control_set} %>%
group_by(Type, add = T)