Home > database >  How to create a count variable based on other variables
How to create a count variable based on other variables

Time:01-20

I am quite new to R and am struggling with the following issue. I have a dataset that contains demographic characteristics and several variables that are based on qualitative data that we coded. There are six variables based on tone, and are called Tone1, Tone2, Tone3, Tone4, Tone5, and Tone6. The tone variables are categorical (1 = positive; 2 = negative; 3 = neutral). Each respondent can have more than one tone. I am trying to create two variables - one is the count of 1 (positive) across the 6 tone columns for each observation, and the other is a count of 2 (negative) across the 6 columns. I have not been able to find exactly what I am looking for online.

The df looks somewhat like this:

resp.   Tone1. Tone2. Tone3. Tone4. Tone5. Tone6. 
a.        1.     2.    1.      1.     2.     NA
b.        2.     2.    NA.     NA.    NA.    NA. 
c.        3.     1.    NA.     NA.    NA.    NA 
d.        1.     1.    2.      2.     1.     1.  

# Creating example df
df <- data.frame( resp = c("a", "b",
                          "c", "d"),
                 Tone1 = c(1, 2, 3, 1),
                 Tone2 = c(2, 2, 1, 1),
                 Tone3 = c(1, NA, NA, 2),
                 Tone4 = c(1, NA, NA, 2), 
                 Tone5 = c(2, NA, NA, 1),
                 Tone6 = c(NA, NA, NA, 1)) 

and I am looking to get this:

resp.   Tone1. Tone2. Tone3. Tone4. Tone5. Tone6. count_pos. count_neg. 
a.        1.     2.    1.      1.     2.     NA.   3.          2
b.        2.     2.    NA.     NA.    NA.    NA.   0           2
c.        3.     1.    NA.     NA.    NA.    NA    1           0
d.        1.     1.    2.      2.     1.     1.    4           2 

I have tried the following and it did not give me the results that I wanted and I end up with a column called count_pos[,"Tone6"] that is filled with NA.

gesis$count_pos <- 0
for (i in 1:6) {
  gesis$count_pos <- gesis$count_pos   ifelse(gesis[,paste0("Tone",i)]==1,1,0)
}


I really appreciate any suggestions and thanks in advance! I really hope this isn't too complicated.

CodePudding user response:

You can avoid for loop and use rowSums instead

 df$count_pos <- rowSums(df[, 2:7]==1, na.rm=TRUE)
 df$count_neg <- rowSums(df[, 2:7]==2, na.rm=TRUE)
 df
  resp Tone1 Tone2 Tone3 Tone4 Tone5 Tone6 count_pos count_neg
1    a     1     2     1     1     2    NA         3         2
2    b     2     2    NA    NA    NA    NA         0         2
3    c     3     1    NA    NA    NA    NA         1         0
4    d     1     1     2     2     1     1         4         2

CodePudding user response:

Here's a tidy way:

library(dplyr)
df <- data.frame( resp = c("a", "b",
                           "c", "d"),
                  Tone1 = c(1, 2, 3, 1),
                  Tone2 = c(2, 2, 1, 1),
                  Tone3 = c(1, NA, NA, 2),
                  Tone4 = c(1, NA, NA, 2), 
                  Tone5 = c(2, NA, NA, 1),
                  Tone6 = c(NA, NA, NA, 1)) 

df %>% 
  rowwise() %>% 
  mutate(tone_pos = sum(c_across(contains("Tone")) == 1, na.rm=TRUE), 
         tone_neg = sum(c_across(contains("Tone")) == 2, na.rm=TRUE))
#> # A tibble: 4 × 9
#> # Rowwise: 
#>   resp  Tone1 Tone2 Tone3 Tone4 Tone5 Tone6 tone_pos tone_neg
#>   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <int>    <int>
#> 1 a         1     2     1     1     2    NA        3        2
#> 2 b         2     2    NA    NA    NA    NA        0        2
#> 3 c         3     1    NA    NA    NA    NA        1        0
#> 4 d         1     1     2     2     1     1        4        2

Created on 2023-01-19 by the reprex package (v2.0.1)

  • Related