I have a dataset that looks like this:
data <- data.frame(Subject = c("A","B","C"),
Col1 = c("Yes", "Yes", "No"),
Col2 = c("Yes", "Yes", "Yes"),
Col3 = c("Yes", "Yes", "Yes")
)
print(data)
Subject Col1 Col2 Col3
1 A Yes Yes Yes
2 B Yes Yes Yes
3 C No Yes Yes
I want to summarize whether all of the columns equal "Yes. If so, new column is "Yes", if one of the columns are NA
or "No", then summary column is "No".
My current code looks something like this, but I feel like there is an easier way:
data %>%
group_by(Subject) %>%
summarize(Summary = case_when(
Col1 == "Yes & Col2 == "Yes & Col3 == "Yes ~ "Yes",
Col1 != "Yes & Col2 != "Yes & Col3 != "Yes ~ "No",
TRUE ~ NA
CodePudding user response:
We may use if_all/if_any
library(dplyr)
data %>%
mutate(Summary = case_when(if_all(starts_with("Col"),
~. == "Yes") ~ "Yes", TRUE ~ "No"))
-output
Subject Col1 Col2 Col3 Summary
1 A Yes Yes Yes Yes
2 B Yes Yes Yes Yes
3 C No Yes Yes No
CodePudding user response:
data %>%
mutate(newcol = rowSums(select(cur_data(), starts_with("Col")) != "Yes") == 0)
# Subject Col1 Col2 Col3 newcol
# 1 A Yes Yes Yes TRUE
# 2 B Yes Yes Yes TRUE
# 3 C No Yes Yes FALSE
That gets you a simple logical
column, in general when a column is a truth-like property, I prefer logical
. If you want that to be literal strings, though, then
data %>%
mutate(newcol = if_else(rowSums(select(cur_data(), starts_with("Col")) != "Yes") == 0, "Yes", "No"))
# Subject Col1 Col2 Col3 newcol
# 1 A Yes Yes Yes Yes
# 2 B Yes Yes Yes Yes
# 3 C No Yes Yes No
As I learn dplyr's more "recent" verbs (relatively speaking), akrun's recommendation to use if_all
makes a lot more sense here, where the above can be done more succinctly as
data %>%
mutate(newcol = if_else(if_all(starts_with("Col"), ~ . == "Yes"), "Yes", "No"))
CodePudding user response:
Another possible solution, in base R
:
data$Summary <- rowSums((data[-1] != "Yes")) == 0
data
#> Subject Col1 Col2 Col3 Summary
#> 1 A Yes Yes Yes TRUE
#> 2 B Yes Yes Yes TRUE
#> 3 C No Yes Yes FALSE
CodePudding user response:
Thanks to everyone for the suggestions. Here is the solution I am moving forward with:
data %>%
mutate(Summary = if_else(rowSums(.[c(2:4)]!="Yes")>0,"No", "Yes"))
*Note, I substituted .c[3:12]
for .c[2:4]
for my actual data frame.