I currently have a dataframe in R that looks something like this:
Procedure 1 | Procedure 2 | Procedure 3 | Procedure 4 | C-section |
---|---|---|---|---|
D | A | NA | NA | 0 |
B | F | NA | NA | 0 |
Z | F | L | NA | 0 |
Z | C | L | NA | 0 |
Each row represents a person and I want to write a script that will change the c-section column for that specific row to = 1 if any value between procedure 1 to procedure 3 equals either 'A' or 'B' or 'F'. Essentially, I want my dataframe to look like this:
Procedure 1 | Procedure 2 | Procedure 3 | Procedure 4 | C-section |
---|---|---|---|---|
D | A | NA | NA | 1 |
B | F | NA | NA | 1 |
Z | F | L | NA | 1 |
Z | C | L | NA | 0 |
Here is the code I have currently written trying to loop through the columns and then using if/else-if statements to change the value of the c-section column.
for(i in 1:3){
if (df[ , i] == 'A'){
df$csection[, i] <- 1
}
else if(df[ , i] == 'B'){
df$csection[, i] <- 1
}
else if(df[ , i] == 'F'){
df$csection[, i] <- 1
}
}
However, I don't seem to be getting the right results and the c-section column remains unchanged.
CodePudding user response:
You do not need to use loops for this kind of operation in R. First make you data reproducible:
dput(dfr)
structure(list(Procedure1 = c("D", "B", "Z", "Z"), Procedure2 = c("A",
"F", "F", "C"), Procedure3 = c(NA, NA, "L", "L"), Procedure4 = c(NA,
NA, NA, NA), C.Section = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
Now identify the rows that need to be changed:
check <- apply(dfr[, 1:3], 1, function(x) any(x %in% c("A", "B", "F")) )
check
# [1] TRUE TRUE TRUE FALSE
Now make the changes:
dfr$C.Section[check] <- 1
dfr
# Procedure1 Procedure2 Procedure3 Procedure4 C.Section
# 1 D A <NA> NA 1
# 2 B F <NA> NA 1
# 3 Z F L NA 1
# 4 Z C L NA 0
CodePudding user response:
- We can use ``
library(dplyr)
df |> rowwise() |>
mutate(C.Section = as.numeric(any(c_across(1:3) %in% c("A", "B", "F"))))
- Output
# A tibble: 4 × 5
# Rowwise:
Procedure1 Procedure2 Procedure3 Procedure4 C.Section
<chr> <chr> <chr> <lgl> <dbl>
1 D A NA NA 1
2 B F NA NA 1
3 Z F L NA 1
4 Z C L NA 0
CodePudding user response:
Using built-in functions:
cols = c("Procedure1", "Procedure2", "Procedure3")
df$C.Section[] = rowSums(sapply(df[cols], `%in%`, c("A", "F", "E")))>0
Procedure1 Procedure2 Procedure3 Procedure4 C.Section
1 D A <NA> NA 1
2 B F <NA> NA 1
3 Z F L NA 1
4 Z C L NA 0