I have a dataset that looks like this:
Study_ID Dose
1 100 Yes
2 100 Yes
3 200 No
4 300 No
5 400 No
6 500 Yes
7 500 Yes
8 800 No
9 800 No
10 900 Yes
Where some of the participant study IDs are duplicated. I want to count the number of 'Yes' or 'No' doses, but only for a unique study ID. In other words, for the participants who are duplicated, the 'Yes' or 'No' dose should only be counted once.
The output should be:
No Yes
4 3
Where each study ID is only counted once.
I've tried to use dplyr's n_distinct, but it does not seem to be working:
n_distinct(data$Study_ID, data$Dose == "Yes")
How can I go about doing this?
Reproducible Data:
data<-data.frame(Study_ID=c("100","100","200","300","400","500","500","800","800","900"),Dose=c("Yes","Yes","No","No","No","Yes","Yes","No","No","Yes"))
CodePudding user response:
This can be done in two steps, first removing duplicate observations of the same dose for the same id, then counting:
data %>%
distinct(Study_ID, Dose) %>%
count(Dose)
Result
Dose n
1 No 4
2 Yes 3
CodePudding user response:
In base
R you can do:
table(unique(data[, c("Study_ID", "Dose")])[, "Dose"])
Output:
No Yes
4 3