How to embed if-statement within for-loop in R?-CodePudding

I currently have a dataframe in R that looks something like this:

Procedure 1	Procedure 2	Procedure 3	Procedure 4
D	A	NA	NA
B	F	NA	NA
Z	F	L	NA
Z	C	L	NA

Each row represents a person and I want to write a script that will change the c-section column for that specific row to = 1 if any value between procedure 1 to procedure 3 equals either 'A' or 'B' or 'F'. Essentially, I want my dataframe to look like this:

Procedure 1	Procedure 2	Procedure 3	Procedure 4	C-section
D	A	NA	NA	1
B	F	NA	NA	1
Z	F	L	NA	1
Z	C	L	NA	0

Here is the code I have currently written trying to loop through the columns and then using if/else-if statements to change the value of the c-section column.

  for(i in 1:3){
    if (df[ , i] == 'A'){
      df$csection[, i] <- 1
  } 
    else if(df[ , i] == 'B'){
      df$csection[, i] <- 1
  } 
    else if(df[ , i] == 'F'){
      df$csection[, i] <- 1
}
}

However, I don't seem to be getting the right results and the c-section column remains unchanged.

CodePudding user response：

You do not need to use loops for this kind of operation in R. First make you data reproducible:

dput(dfr)
structure(list(Procedure1 = c("D", "B", "Z", "Z"), Procedure2 = c("A", 
"F", "F", "C"), Procedure3 = c(NA, NA, "L", "L"), Procedure4 = c(NA, 
NA, NA, NA), C.Section = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

Now identify the rows that need to be changed:

check <- apply(dfr[, 1:3], 1, function(x) any(x %in% c("A", "B", "F")) )
check
# [1]  TRUE  TRUE  TRUE FALSE

Now make the changes:

dfr$C.Section[check] <- 1
dfr
#   Procedure1 Procedure2 Procedure3 Procedure4 C.Section
# 1          D          A       <NA>         NA         1
# 2          B          F       <NA>         NA         1
# 3          Z          F          L         NA         1
# 4          Z          C          L         NA         0

CodePudding user response：

We can use ``

library(dplyr)

df |> rowwise() |>
    mutate(C.Section = as.numeric(any(c_across(1:3) %in% c("A", "B", "F"))))

Output

# A tibble: 4 × 5
# Rowwise: 
  Procedure1 Procedure2 Procedure3 Procedure4 C.Section
  <chr>      <chr>      <chr>      <lgl>          <dbl>
1 D          A          NA         NA                 1
2 B          F          NA         NA                 1
3 Z          F          L          NA                 1
4 Z          C          L          NA                 0

CodePudding user response：

Using built-in functions:

cols = c("Procedure1", "Procedure2", "Procedure3")

df$C.Section[] = rowSums(sapply(df[cols], `%in%`, c("A", "F", "E")))>0

  Procedure1 Procedure2 Procedure3 Procedure4 C.Section
1          D          A       <NA>         NA         1
2          B          F       <NA>         NA         1
3          Z          F          L         NA         1
4          Z          C          L         NA         0