I am working with a survey data set in which each observation (respondents) is represented by an own row. I want to create a new (numeric) variable which counts the number of times a condition is met by other variables per row. More specifically, the dataframe contains several numerical variables (var1, var2, var3 in the example below). Each time that a value of those variables is >=3 and not NA, the new variable (desiredvar) should increase by 1. As you can see in the example, the desired variable takes the value 2 for the first row, since var1 and var3 are both >= 3.
df1 <- data.frame(var1 = c(3, NA, 2, 1),
var2 = c(0, 0, 2, 1),
var3 = c(8, NA, 5, 6),
desiredvar = c(2, 0, 1, 1))
var1 var2 var3 desiredvar
1 3 0 8 2
2 NA 0 NA 0
3 2 2 5 1
4 1 1 6 1
I am assuming that it should be relatively easy to code that with a for loop and/or apply
, but I am not very experienced with R. Would appreciate any help!
Best, Carlo
CodePudding user response:
You can use rowSums
with na.rm = TRUE
:
df1$desiredvar <- rowSums(df1 >= 3, na.rm = TRUE)
or with apply
:
df1$desiredvar <- apply(df1 >= 3, 1, sum, na.rm = T)
var1 var2 var3 desiredvar
1 3 0 8 2
2 NA 0 NA 0
3 2 2 5 1
4 1 1 6 1
In dplyr
, you could use the abovementioned answers, or use rowwise
and c_across
:
library(dplyr)
df1 %>%
rowwise() %>%
mutate(desiredvar = sum(c_across(var1:var3) >= 3, na.rm = T))