I am working with a dataframe with thousands of responses to questions about interest in a set of resources. I want to summarize how many participants are interested in a given resource by counting the number of positive responses (coded as "1").
As a final step, I would like to suppress any answer where <5 participants responded.
I've created code that works, but its clunky when I'm dealing with dozens of fields. So, I'm looking for suggestions for a more streamlined approach, perhaps using piping or dplyr?
Example Input
ID | Resource1 | Resource2 | Resource3 | Resource4 |
---|---|---|---|---|
1 | 1 | 0 | 1 | 1 |
2 | 0 | 0 | 0 | 1 |
3 | 1 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 |
5 | 1 | 1 | 1 | 1 |
Desired output
Interested | Not Interested | |
---|---|---|
Resource1 | 3 | 2 |
Resource2 | 1 | 4 |
Resource3 | 2 | 3 |
Resource4 | 3 | 2 |
My (ugly) code
###Select and summarise relevent columns
resource1 <- df %>% drop_na(resource1) %>% group_by(resource1) %>% summarise(n=n()) %>% rename(resp=resource1, r1 =n)
resource2 <- df %>% drop_na(resource2) %>% group_by(resource2) %>% summarise(n=n()) %>% rename(resp=resource2, r2 =n)
resource3 <- df %>% drop_na(resource3) %>% group_by(resource3) %>% summarise(n=n()) %>% rename(resp=resource3, r3 =n)
resource4 <- df %>% drop_na(resource4) %>% group_by(resource4) %>% summarise(n=n()) %>% rename(resp=resource4, r4 =n)
###Merge summarised data
resource_sum <-join_all(list(resource1,resource2,resource3,resource4), by=c("resp"))
###Replace all values less than 5 with NA per suppression rules.
resource_sum <- apply(resource_sum, function(x) ifelse(x<5, "NA", x))
resource_sum <-as.data.frame(resource_sum)
CodePudding user response:
We may reshape into 'long' format with pivot_longer
and then do a group by summarise
to get the count of 1s and 0s
library(dplyr)
library(tidyr)
library(tibble)
df %>%
pivot_longer(cols = -ID) %>%
group_by(name) %>%
summarise(Interested = sum(value), NotInterested = n() - Interested) %>%
column_to_rownames('name')
-output
Interested NotInterested
Resource1 3 2
Resource2 1 4
Resource3 2 3
Resource4 3 2
Or using base R
v1 <- colSums(df[-1])
cbind(Interested = v1, NotInterested = nrow(df) - v1)
-output
Interested NotInterested
Resource1 3 2
Resource2 1 4
Resource3 2 3
Resource4 3 2
data
df <- structure(list(ID = 1:5, Resource1 = c(1L, 0L, 1L, 0L, 1L),
Resource2 = c(0L,
0L, 0L, 0L, 1L), Resource3 = c(1L, 0L, 0L, 0L, 1L), Resource4 = c(1L,
1L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA, -5L
))
CodePudding user response:
You can use table
to get counts of 0 and 1 value. To apply the function (table
) to multiple columns you can use sapply
-
t(sapply(df[-1], table))
# 0 1
#Resource1 2 3
#Resource2 4 1
#Resource3 3 2
#Resource4 2 3