Home > Software design >  R Summarize"Yes" if all columns are "Yes"
R Summarize"Yes" if all columns are "Yes"

Time:06-16

I have a dataset that looks like this:

data <- data.frame(Subject = c("A","B","C"),
          Col1 = c("Yes", "Yes", "No"),
          Col2 = c("Yes", "Yes", "Yes"),
          Col3 = c("Yes", "Yes", "Yes")
                   )

print(data)

 Subject Col1 Col2 Col3
1       A  Yes  Yes  Yes
2       B  Yes  Yes  Yes
3       C   No  Yes  Yes

I want to summarize whether all of the columns equal "Yes. If so, new column is "Yes", if one of the columns are NA or "No", then summary column is "No".

My current code looks something like this, but I feel like there is an easier way:

data %>%
group_by(Subject) %>%
summarize(Summary = case_when(
    Col1 == "Yes & Col2 == "Yes & Col3 == "Yes ~ "Yes",
    Col1 != "Yes & Col2 != "Yes & Col3 != "Yes ~ "No",
    TRUE ~ NA

CodePudding user response:

We may use if_all/if_any

library(dplyr)
data %>% 
 mutate(Summary = case_when(if_all(starts_with("Col"), 
     ~. == "Yes") ~ "Yes", TRUE ~ "No"))

-output

 Subject Col1 Col2 Col3 Summary
1       A  Yes  Yes  Yes     Yes
2       B  Yes  Yes  Yes     Yes
3       C   No  Yes  Yes      No

CodePudding user response:

data %>%
  mutate(newcol = rowSums(select(cur_data(), starts_with("Col")) != "Yes") == 0)
#   Subject Col1 Col2 Col3 newcol
# 1       A  Yes  Yes  Yes   TRUE
# 2       B  Yes  Yes  Yes   TRUE
# 3       C   No  Yes  Yes  FALSE

That gets you a simple logical column, in general when a column is a truth-like property, I prefer logical. If you want that to be literal strings, though, then

data %>%
  mutate(newcol = if_else(rowSums(select(cur_data(), starts_with("Col")) != "Yes") == 0, "Yes", "No"))
#   Subject Col1 Col2 Col3 newcol
# 1       A  Yes  Yes  Yes    Yes
# 2       B  Yes  Yes  Yes    Yes
# 3       C   No  Yes  Yes     No

As I learn dplyr's more "recent" verbs (relatively speaking), akrun's recommendation to use if_all makes a lot more sense here, where the above can be done more succinctly as

data %>%
  mutate(newcol = if_else(if_all(starts_with("Col"), ~ . == "Yes"), "Yes", "No"))

CodePudding user response:

Another possible solution, in base R:

data$Summary <- rowSums((data[-1] != "Yes")) == 0
data

#>   Subject Col1 Col2 Col3 Summary
#> 1       A  Yes  Yes  Yes    TRUE
#> 2       B  Yes  Yes  Yes    TRUE
#> 3       C   No  Yes  Yes   FALSE

CodePudding user response:

Thanks to everyone for the suggestions. Here is the solution I am moving forward with:

data %>%
   mutate(Summary = if_else(rowSums(.[c(2:4)]!="Yes")>0,"No", "Yes"))

*Note, I substituted .c[3:12] for .c[2:4] for my actual data frame.

  • Related