Home > Software engineering >  how to find out how many respondents have at least 3 missing responses?
how to find out how many respondents have at least 3 missing responses?

Time:10-08

I am just starting to learn r studio. I have a data set that contains variables v1 to v6 that represent different groups, contains values 0 and 1 that represent the answer no and yes. So my question is How many respondents have at least 3 missing responses from questions v1 to v6?

CodePudding user response:

You can try to count the sum by row of your data.frame

Print and paste all the following code with seed included

#1- Simulation data
set.seed(1)
values=c(0,1,NA)
    df=data.frame(
v1=sample(values,10,TRUE),
v2=sample(values,10,TRUE),
v3=sample(values,10,TRUE),
v4=sample(values,10,TRUE),
v5=sample(values,10,TRUE),
v6=sample(values,10,TRUE)
)

#2- Number of each value by row
#Number of NA values by row
df$nbNA=apply(df,1,function(x) sum(is.na(x)))

#Number of 0 values by row
df$nb0=apply(df,1,function(x) sum(x==0,na.rm=TRUE))

#Number of 1 values by row
df$nb1=apply(df,1,function(x) sum(x==1,na.rm=TRUE))

CodePudding user response:

Here is a solution in dplyr (part of the tidyverse), where the final output will give you a tibble with number of missing responses for each individual.

library(tidyverse)

# Random number
set.seed(4)

# Make some example data, I assume it looks something like this
data = tibble(
  v1 = sample(x = c("no","yes", NA), size = 100, replace = TRUE, prob = c(0.4, 0.4, 0.2)),
  v2 = sample(x = c("no","yes", NA), size = 100, replace = TRUE, prob = c(0.4, 0.4, 0.2)),
  v3 = sample(x = c("no","yes", NA), size = 100, replace = TRUE, prob = c(0.4, 0.4, 0.2)),
  v4 = sample(x = c("no","yes", NA), size = 100, replace = TRUE, prob = c(0.4, 0.4, 0.2)),
  v5 = sample(x = c("no","yes", NA), size = 100, replace = TRUE, prob = c(0.4, 0.4, 0.2)),
  v6 = sample(x = c("no","yes", NA), size = 100, replace = TRUE, prob = c(0.4, 0.4, 0.2)),
  id = 1:100
  )

data
#> # A tibble: 100 x 7
#>    v1    v2    v3    v4    v5    v6       id
#>    <chr> <chr> <chr> <chr> <chr> <chr> <int>
#>  1 no    yes   no    <NA>  <NA>  no        1
#>  2 yes   no    no    no    yes   no        2
#>  3 yes   yes   no    yes   yes   yes       3
#>  4 yes   no    yes   yes   no    yes       4
#>  5 <NA>  no    yes   <NA>  yes   yes       5
#>  6 yes   no    yes   <NA>  no    <NA>      6
#>  7 no    no    no    <NA>  <NA>  yes       7
#>  8 <NA>  yes   no    <NA>  <NA>  yes       8
#>  9 <NA>  <NA>  no    yes   yes   no        9
#> 10 yes   <NA>  yes   <NA>  yes   yes      10
#> # ... with 90 more rows

# We then pivot the data into a long format
long_data = data %>% 
  pivot_longer(cols = starts_with("v"), names_to = "group", values_to = "response")

long_data
#> # A tibble: 600 x 3
#>       id group response
#>    <int> <chr> <chr>   
#>  1     1 v1    no      
#>  2     1 v2    yes     
#>  3     1 v3    no      
#>  4     1 v4    <NA>    
#>  5     1 v5    <NA>    
#>  6     1 v6    no      
#>  7     2 v1    yes     
#>  8     2 v2    no      
#>  9     2 v3    no      
#> 10     2 v4    no      
#> # ... with 590 more rows


# We then summarise the number of missing values for each individual, and filter for those with > 3
long_data %>% 
  filter(is.na(response)) %>% 
  group_by(id) %>% 
  tally() %>% 
  filter(n > 2)
#> # A tibble: 9 x 2
#>      id     n
#>   <int> <int>
#> 1     8     3
#> 2    14     3
#> 3    19     3
#> 4    26     3
#> 5    36     3
#> 6    41     3
#> 7    49     3
#> 8    84     4
#> 9    90     3

Created on 2021-10-07 by the reprex package (v0.3.0)

  •  Tags:  
  • r
  • Related