I have at hand a dataset where answers to some questions depends on other questions asked before and need to perform an aggregation taking these questions into account (for example : "difficultyX" needs "isdifficult" == 1, and I must compute is.na(difficultyX) knowing that the condition holds)
The problem is that I have several columns all with different conditions to meet (X1 must check column Y1, x2 column Y2 etc ...)
For now I have tried pivoting the table and joining the correspondance to the condition columns and value and my DB looks like this :
after pivoting and joining
test<-tibble(Y1=1:3,Y2=1:3,var_to_test=c("x1","x2","x3"),condition=c("Y1","Y2","Y2"),value=c(1,2,2))
and I want the output to lok like this :
testoutput<-tibble(Y1=1:3,Y2=1:3,var_to_test=c("x1","x2","x3"),condition=c("Y1","Y2","Y2"),value=c(1,2,1),cond_verif=c(T,T,F))
for now I can perform the test I want for one row with
#this works
test[[1,test$condition[1]]]==test$value[1]
#this does not
test[[,test$condition]]==test$value
#this one takes awfully long (2 secs for 10K obs, in long format I have 700K of them)
for(i in 1:3){
vec[i]<-test[[i,test$condition[i]]]==test$value[i]
}
Therefore I am looking for a generalization which would work in a reasonable amount of time, it can be with map functions, apply functions, dplyr or even base R but I have yet been unable to figure it out...
Thank you for your time
CodePudding user response:
library(dplyr)
library(tidyr)
test<-tibble(Y1=1:3,Y2=1:3,var_to_test=c("x1","x2","x3"),condition=c("Y1","Y2","Y2"),value=c(1,2,2))
Possible solution using tidyr::pivot_longer
:
1.Bring data in long form with a column specifying the (expected?) value for each test.
2.Filter out unused rows (rows where test_condition does not match the test that was applied)
3.Compare the values and create new column cond_verif
test %>%
pivot_longer(col = c(Y1, Y2), names_to = "test_condition", values_to = "test_value") %>%
filter(condition == test_condition) %>%
mutate(cond_verif = value == test_value)
This returns:
# A tibble: 3 x 6
var_to_test condition value test_condition test_value cond_verif
<chr> <chr> <dbl> <chr> <int> <lgl>
1 x1 Y1 1 Y1 1 TRUE
2 x2 Y2 2 Y2 2 TRUE
3 x3 Y2 2 Y2 3 FALSE