Home > other >  Comparing each row to a solutions vector with the correct answer and converting the orginal values b
Comparing each row to a solutions vector with the correct answer and converting the orginal values b

Time:02-01

I have one data frame containing the original answers and one dataframe that has the solutions to these answers. I want to compare the answers with the solutions and convert the original ones to 1/0 (correct and incorrect, I could use a boolean but for this analysis I require a numeric datatype).

This can easily be done by just using ifelse in transmute I believe, but the problem is that my columns are dynamic. I'm trying to make my script dynamic, so it can adapt to the amount of columns that the test has. So I can't call the columns directly in a logical test.

answers <- data.frame(x1 = c(3, 2, 1), x2 = c(2, 1, 2), x3 = c(1, 3, 1))
  x1 x2 x3
  3  2  1
  2  1  3
  1  2  1

solution <- data.frame(x1 = 3, x2 = 2, x3 = 1)
  x1 x2 x3
  3  2  1

Goal:

  x1 x2 x3
  1  1  1
  0  0  0
  0  1  1

I can solve this with a loop in base R, like this:

while(i <= nrow(df)){
  z <- 1
  while(z <= ncol(df)){
    if(!is.na(df[i,z])){
      if(df[i,z] == solutions[1,z]){
        df[i,z] <- 1
      }else{
        df[i,z] <- 0
      }
    }
    z <- z   1
  }
  i <- i   1
}

But I think this is a pretty cumbersome way of achieving this result. I was wondering whether dplyr has an easier way to achieve this? I've looked into using apply, but this is still base R and I'd like to know if this is possible within dplyr.

CodePudding user response:

You can use across.

df1 %>% 
  mutate(
    across(everything(), ~ifelse(. == df2$., 1, 0) 
  )
)

Assuming that df1 and df2 are your dataframes. Including your data as reproducible dataframes will help people to answer your questions, even if they are simple.

CodePudding user response:

@SEcker's response is much more elegant, but another alternative would be to pivot the data frames from wide to long in order to make the comparison, although this uses functions from tidyr so it's not strictly a dplyr solution.

library(tidyverse)

#create some random data
df <- tibble(X1 = ceiling(runif(10) * 5),
             X2 = ceiling(runif(10) * 5),
             X3 = ceiling(runif(10) * 5))

solutions <- tibble(X1 = ceiling(runif(10) * 5),
                    X2 = ceiling(runif(10) * 5),
                    X3 = ceiling(runif(10) * 5))


#Add a column to track the row, then pivot from wide to long
df %>% 
  mutate(row = row_number()) %>% 
  pivot_longer(-row,
               names_to = "column",
               values_to = "answer") %>% 
  #Merge with a similarly transformed solutions df
  inner_join({
    solutions %>% 
      mutate(row = row_number()) %>% 
      pivot_longer(-row,
                   names_to = "column",
                   values_to = "solution")
  }, by = c("row", "column")) %>%
  #Check whether the answer matches the soltion
  mutate(is_correct = as.numeric(answer == solution)) %>% 
  #pivot back to wide
  select(row, column, is_correct) %>% 
  pivot_wider(id_cols = row,
              names_from = column,
              values_from = is_correct) %>% 
  #no longer need the row
  select(-row)


# A tibble: 10 x 3
X1    X2    X3
<dbl> <dbl> <dbl>
1     1     0     0
2     0     0     0
3     0     1     0
4     0     0     0
5     1     1     0
6     0     0     1
7     1     1     0
8     0     0     0
9     0     1     1
10     0     0     0
  •  Tags:  
  • Related