Compare string values with multiple logical columns of dataframe-CodePudding

I have a dataframe with the following fields

class         bed_1  chair_1  bottle_1  table_1      ...

bed          TRUE    FALSE   FALSE      FALSE        ...
chair        FALSE   TRUE    FALSE      FALSE        ...
sofa         FALSE   FALSE   TRUE       FALSE        ...
table        FALSE   FALSE   FALSE      FALSE        ...

I want to compare class column with all other columns. The following is the expected output

class     new_col
bed       bed_1
chair     chair_1
bottle    bottle_1
table

So, essentially, I need to pickup column name with TRUE value for specific class.

The solution i tried takes long time due to large number of records, I am looking for an efficient way of doing this. Here is my solution.

new_df <- data.frame(class = df$class, new_col = NA)
for (row_n in 1:length(df$class) ){
indx <- which (df[row_n, ] == 'TRUE')
new_df$new_col[row_n] <- ifelse (length(indx) > 0, colnames(df)[idx], '') 
}

CodePudding user response：

Get the data in long format and filter the TRUE values.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -class) %>%
  filter(value) %>%
  select(-value)

#  class name    
#  <chr> <chr>   
#1 bed   bed_1   
#2 chair chair_1 
#3 sofa  bottle_1

If you have one TRUE value in each row you can use max.col.

new_df <- cbind(df[1], new_col = names(df[-1])[max.col(df[-1])])

To extract only one TRUE value you may do -

df %>%
  pivot_longer(cols = -class) %>%
  group_by(class) %>%
  slice(match(TRUE, value)) %>%
  select(-value)

CodePudding user response：

Here is a base R way that returns the empty string "" in case all values in a row are FALSE.

y <- apply(df1[-1], 1, \(i, x = names(df1[-1])) {
  y <- x[i]
  if(length(y)) y[1] else ""
})
df2 <- data.frame(class = df1$class, new_col = y)

df2
#  class  new_col
#1   bed    bed_1
#2 chair  chair_1
#3  sofa bottle_1
#4 table

Data

df1 <-
structure(list(class = c("bed", "chair", "sofa", "table"), 
bed_1 = c(TRUE, FALSE, FALSE, FALSE), 
chair_1 = c(FALSE, TRUE, FALSE, FALSE), 
bottle_1 = c(FALSE, FALSE, TRUE, FALSE), 
table_1 = c(FALSE, FALSE, FALSE, FALSE)), 
class = "data.frame", row.names = c(NA, -4L))