Home > OS >  Perform a test when the column containing the condition varies (R)
Perform a test when the column containing the condition varies (R)

Time:04-26

I have at hand a dataset where answers to some questions depends on other questions asked before and need to perform an aggregation taking these questions into account (for example : "difficultyX" needs "isdifficult" == 1, and I must compute is.na(difficultyX) knowing that the condition holds)

The problem is that I have several columns all with different conditions to meet (X1 must check column Y1, x2 column Y2 etc ...)

For now I have tried pivoting the table and joining the correspondance to the condition columns and value and my DB looks like this :

after pivoting and joining

test<-tibble(Y1=1:3,Y2=1:3,var_to_test=c("x1","x2","x3"),condition=c("Y1","Y2","Y2"),value=c(1,2,2))

and I want the output to lok like this :


testoutput<-tibble(Y1=1:3,Y2=1:3,var_to_test=c("x1","x2","x3"),condition=c("Y1","Y2","Y2"),value=c(1,2,1),cond_verif=c(T,T,F))

for now I can perform the test I want for one row with

#this works
test[[1,test$condition[1]]]==test$value[1]

#this does not
test[[,test$condition]]==test$value

#this one takes awfully long (2 secs for 10K obs, in long format I have 700K of them)

for(i in 1:3){
    vec[i]<-test[[i,test$condition[i]]]==test$value[i]
 }

Therefore I am looking for a generalization which would work in a reasonable amount of time, it can be with map functions, apply functions, dplyr or even base R but I have yet been unable to figure it out...

Thank you for your time

CodePudding user response:

library(dplyr)
library(tidyr)

test<-tibble(Y1=1:3,Y2=1:3,var_to_test=c("x1","x2","x3"),condition=c("Y1","Y2","Y2"),value=c(1,2,2))

Possible solution using tidyr::pivot_longer:

1.Bring data in long form with a column specifying the (expected?) value for each test.

2.Filter out unused rows (rows where test_condition does not match the test that was applied)

3.Compare the values and create new column cond_verif

test %>% 
  pivot_longer(col = c(Y1, Y2), names_to = "test_condition", values_to = "test_value") %>% 
  filter(condition == test_condition) %>% 
  mutate(cond_verif = value == test_value)

This returns:

# A tibble: 3 x 6
  var_to_test condition value test_condition test_value cond_verif
  <chr>       <chr>     <dbl> <chr>               <int> <lgl>     
1 x1          Y1            1 Y1                      1 TRUE      
2 x2          Y2            2 Y2                      2 TRUE      
3 x3          Y2            2 Y2                      3 FALSE   
  • Related