I am sure the solution to my problem is simple but I am new to coding and cannot seem to find the answer online. I am working on a dataset that is made up of qualitative data that was collected and coded. The dataset includes variables named code 1, code 2, code 3, code 4 and each respondent can have multiple codes and they all have at least one code. I am trying to add a variable that will reflect the number of codes given to a participant. So, participants data looks something like this with the numerical values being codes that we assign given their response:
ID Code1 Code2 Code3 Code4
1. 5 NA NA NA
2. 7 6 4 NA
3. 5 12 NA NA
The variable I want to include would be the one named count and would look like this:
ID Code1 Code2 Code3 Code4 Count
1. 5 NA NA NA 1
2. 7 6 4 NA 3
3. 5 12 NA NA 2
The first participant would have the number 1 under Count because they only received one code, participant 2 would have a number three under count because they have three codes, and participant 3 would have 2 codes under count because they were only assigned two codes.
Anyway, I have tried using the ifelse function using NA since that signals that fewer codes were assigned but when I try to use it I cannot assign more than 2 outcomes, that is my count variable cannot be more than two different numbers and these can go up to 4. I have also tried using case_when but get an error message saying Error: Case 7 (!is.na(Code1) ~ 1
) must be a two-sided formula, not a logical vector.
Here is an example of what I have tried:
df$count = ifelse(is.na(df$Code2),1,2)
df$count = ifelse(is.na(Klara$Code3),2,3)
df$count = ifelse(is.na(Klara$Code4),3,4)
I have also tried:
df <- df %>%
mutate(count = case_when(!is.na(Code1) ~ 1,
!is.na(Code2) ~ 2,
!is.na(Code3) ~ 3,
!is.na(Code4) ~ 4,
xor(Code1,Code2)))
So, I cannot figure out what I am doing wrong and how I can get the count variable I need to work. Any suggestions?
Many thanks in advance!!
CodePudding user response:
A dplyr
approach using rowSums
and across
:
library(dplyr, warn = FALSE)
dat <- dat |>
mutate(count = rowSums(
across(starts_with("Code"), ~ !is.na(.x))
))
dat
#> ID Code1 Code2 Code3 Code4 count
#> 1 1 5 NA NA NA 1
#> 2 2 7 6 4 NA 3
#> 3 3 5 12 NA NA 2
Or using base R:
dat$count <- rowSums(
!is.na(dat[grep("^Code", names(dat), value = TRUE)])
)
dat
#> ID Code1 Code2 Code3 Code4 count
#> 1 1 5 NA NA NA 1
#> 2 2 7 6 4 NA 3
#> 3 3 5 12 NA NA 2
DATA
dat <- structure(list(ID = c(1, 2, 3), Code1 = c(5L, 7L, 5L), Code2 = c(
NA,
6L, 12L
), Code3 = c(NA, 4L, NA), Code4 = c(NA, NA, NA)), class = "data.frame", row.names = c(
NA,
-3L
))
CodePudding user response:
I think you are looking for something like this:
Recreating data (using tidyverse) - you can ignore this
a = c(1, 5, NA, NA, NA)
b = c(2, 7, 6, 4, NA)
c = c(3, 5, 12, NA, NA)
df <- cbind(a,b,c) %>%
t() %>%
data.frame() %>%
setNames(c('id', 'code1', 'code2', 'code3', 'code4'))
Solutions:
#a
df$count <- rowSums(!is.na(df) & !colnames(df)=='id')
#b
df$count <- apply(df, 1, \(x) sum(!is.na(x) & !colnames(df)=='id'))
id code1 code2 code3 code4 count
a 1 5 NA NA NA 1
b 2 7 6 4 NA 3
c 3 5 12 NA NA 2