Home > OS >  Count 'Yes' in Specific Columns for Each Row in Loop
Count 'Yes' in Specific Columns for Each Row in Loop

Time:09-12

I have a loop going through my data frame to fill a new column based on other column values. I need to identify if there is a 'yes' in more than one column.

Dataset:

df <- data.frame(name=c("AJ", "DJ", "EJ", "MJ", "CJ"),
                 meds_1=c("yes","yes", "no", "no", "yes"), 
                 meds_2=c("no", "no","no", "yes", "yes"),
                 meds_3=c("yes", "no","no", "no", "no"),
                 meds_4=c("no", "no","no", "yes", "yes"),
                 status=c("Non-stable","Non-stable","stable", "stable", "Non-stable"))

#the new column
df$status_check <- NA

The order should go:

If status == 'stable', then status_check = 'stable', else if someone has 'yes' in more than one column that starts with 'meds_', then status_check = 'combo', else 'other'.

The issue is that I'm not sure how to loop over the columns that start with 'meds_', count the number of 'yes' and if it is over 1, then status_check is 'Combo'

My loop so far:

#I am not sure what function needs to go into the second ifelse statement.
for(i in 1:nrow(df)){
 df$status_check<-ifelse(df$status == "stable", "stable",
                            ifelse([some function],"Combo","Other"))
}

The results should look like:

  name meds_1 meds_2 meds_3 meds_4     status status_check
1   AJ    yes     no     no     no Non-stable        Other
2   DJ    yes     no     no     no Non-stable        Other
3   EJ     no     no     no     no     stable       stable
4   MJ     no    yes     no     no     stable       stable
5   CJ    yes    yes     no     no Non-stable        Combo

CodePudding user response:

You could use rowSums to compute the number of yes columns for each row. Also, there is no need for a for loop.

Note: Your example data differed from the one you posted for your desired result, i.e. there are two yes in row 1.

df <- data.frame(
  name = c("AJ", "DJ", "EJ", "MJ", "CJ"),
  meds_1 = c("yes", "yes", "no", "no", "yes"),
  meds_2 = c("no", "no", "no", "yes", "yes"),
  meds_3 = c("yes", "no", "no", "no", "no"),
  meds_4 = c("no", "no", "no", "yes", "yes"),
  status = c("Non-stable", "Non-stable", "stable", "stable", "Non-stable")
)

df$status_check <- ifelse(df$status == "stable", "stable", ifelse(rowSums(df[, 2:5] == "yes") > 1, "Combo", "Other"))

df
#>   name meds_1 meds_2 meds_3 meds_4     status status_check
#> 1   AJ    yes     no    yes     no Non-stable        Combo
#> 2   DJ    yes     no     no     no Non-stable        Other
#> 3   EJ     no     no     no     no     stable       stable
#> 4   MJ     no    yes     no    yes     stable       stable
#> 5   CJ    yes    yes     no    yes Non-stable        Combo

Created on 2022-09-11 with reprex v2.0.2

CodePudding user response:

With dplyr, we can start with rowSums to get all rows with >= 2 yes in meds. starts_with is very useful for this use case. Then a simple case_when does the trick.

Observation I used the data from the desired output dataframe, as it differs from the original data.

library(dplyr)

df %>%
    mutate(status_check = rowSums(across(starts_with('meds_'), ~.x == 'yes')),
           status_check =
               case_when(status == 'stable' ~ 'stable',
                         status_check >=2 ~ 'combo',
                         TRUE ~ "other"))

  name meds_1 meds_2 meds_3 meds_4     status status_check
1   AJ    yes     no     no     no Non-stable        other
2   DJ    yes     no     no     no Non-stable        other
3   EJ     no     no     no     no     stable       stable
4   MJ     no    yes     no     no     stable       stable
5   CJ    yes    yes     no     no Non-stable        combo
  • Related