Home > OS >  How to count number of unique occurrences based on multiple columns
How to count number of unique occurrences based on multiple columns

Time:06-25

I have a dataset that looks like this:

   Study_ID Dose
1       100  Yes
2       100  Yes
3       200   No
4       300   No
5       400   No
6       500  Yes
7       500  Yes
8       800   No
9       800   No
10      900  Yes

Where some of the participant study IDs are duplicated. I want to count the number of 'Yes' or 'No' doses, but only for a unique study ID. In other words, for the participants who are duplicated, the 'Yes' or 'No' dose should only be counted once.

The output should be:

 No Yes 
  4  3

Where each study ID is only counted once.

I've tried to use dplyr's n_distinct, but it does not seem to be working:

n_distinct(data$Study_ID, data$Dose == "Yes")

How can I go about doing this?

Reproducible Data:

data<-data.frame(Study_ID=c("100","100","200","300","400","500","500","800","800","900"),Dose=c("Yes","Yes","No","No","No","Yes","Yes","No","No","Yes"))

CodePudding user response:

This can be done in two steps, first removing duplicate observations of the same dose for the same id, then counting:

data %>% 
  distinct(Study_ID, Dose) %>% 
  count(Dose)

Result

  Dose n
1   No 4
2  Yes 3

CodePudding user response:

In base R you can do:

table(unique(data[, c("Study_ID", "Dose")])[, "Dose"])

Output:

 No Yes 
  4   3
  • Related