I have a loop going through my data frame to fill a new column based on other column values. I need to identify if there is a 'yes' in more than one column.
Dataset:
df <- data.frame(name=c("AJ", "DJ", "EJ", "MJ", "CJ"),
meds_1=c("yes","yes", "no", "no", "yes"),
meds_2=c("no", "no","no", "yes", "yes"),
meds_3=c("yes", "no","no", "no", "no"),
meds_4=c("no", "no","no", "yes", "yes"),
status=c("Non-stable","Non-stable","stable", "stable", "Non-stable"))
#the new column
df$status_check <- NA
The order should go:
If status == 'stable', then status_check = 'stable', else if someone has 'yes' in more than one column that starts with 'meds_', then status_check = 'combo', else 'other'.
The issue is that I'm not sure how to loop over the columns that start with 'meds_', count the number of 'yes' and if it is over 1, then status_check is 'Combo'
My loop so far:
#I am not sure what function needs to go into the second ifelse statement.
for(i in 1:nrow(df)){
df$status_check<-ifelse(df$status == "stable", "stable",
ifelse([some function],"Combo","Other"))
}
The results should look like:
name meds_1 meds_2 meds_3 meds_4 status status_check
1 AJ yes no no no Non-stable Other
2 DJ yes no no no Non-stable Other
3 EJ no no no no stable stable
4 MJ no yes no no stable stable
5 CJ yes yes no no Non-stable Combo
CodePudding user response:
You could use rowSums
to compute the number of yes
columns for each row. Also, there is no need for a for
loop.
Note: Your example data differed from the one you posted for your desired result, i.e. there are two yes
in row 1.
df <- data.frame(
name = c("AJ", "DJ", "EJ", "MJ", "CJ"),
meds_1 = c("yes", "yes", "no", "no", "yes"),
meds_2 = c("no", "no", "no", "yes", "yes"),
meds_3 = c("yes", "no", "no", "no", "no"),
meds_4 = c("no", "no", "no", "yes", "yes"),
status = c("Non-stable", "Non-stable", "stable", "stable", "Non-stable")
)
df$status_check <- ifelse(df$status == "stable", "stable", ifelse(rowSums(df[, 2:5] == "yes") > 1, "Combo", "Other"))
df
#> name meds_1 meds_2 meds_3 meds_4 status status_check
#> 1 AJ yes no yes no Non-stable Combo
#> 2 DJ yes no no no Non-stable Other
#> 3 EJ no no no no stable stable
#> 4 MJ no yes no yes stable stable
#> 5 CJ yes yes no yes Non-stable Combo
Created on 2022-09-11 with reprex v2.0.2
CodePudding user response:
With dplyr
, we can start with rowSums
to get all rows with >= 2 yes
in meds
. starts_with
is very useful for this use case.
Then a simple case_when
does the trick.
Observation I used the data from the desired output dataframe, as it differs from the original data.
library(dplyr)
df %>%
mutate(status_check = rowSums(across(starts_with('meds_'), ~.x == 'yes')),
status_check =
case_when(status == 'stable' ~ 'stable',
status_check >=2 ~ 'combo',
TRUE ~ "other"))
name meds_1 meds_2 meds_3 meds_4 status status_check
1 AJ yes no no no Non-stable other
2 DJ yes no no no Non-stable other
3 EJ no no no no stable stable
4 MJ no yes no no stable stable
5 CJ yes yes no no Non-stable combo