I have a dataset that looks like this:
Product Patient_ID
1 A 1
2 A 1
3 A 1
4 A 3
5 D 3
6 D 4
7 D 5
8 E 5
9 E 6
10 F 7
11 G 8
Where I'd like to count the number of unique individuals who have used a product. In other words, I would like to get a frequency for the 'Product' column, based on unique 'Patient IDs'.
My desired dataset would look something like this:
Product Freq
1 A 2
2 D 3
3 E 2
4 F 1
5 G 1
How can I go about doing this?
Reproducible data:
test_data<-data.frame(Product=c("A","A","A","A","D","D","D","E","E","F","G"),Patient_ID=c("1","1","1","3","3","4","5","5","6","7","8"))
CodePudding user response:
base R solution: you can first remove duplicates with the !duplicated function remove the merged column's value using the paste0 function, then, use table for the product column.
rez<-as.data.frame(table(test_data[!duplicated(paste0(test_data$Product,test_data$Patient_ID)),"Product"]))
colnames(rez)[1]<-"Product"
If you have only these two column you can skip the paste0 and do:
rez<-as.data.frame(table(test_data[!duplicated(test_data),"Product"]))
CodePudding user response:
This should help
you first load the tidyverse package
Use the distinct() function to get the distinct values in the columns, then group the column based on the Products names and Use the summarize () function to get the unique count frequency
unique_count <- test_data %<% distinct(Product,Patient_ID) %<% group_by(Product)%>% summarize ("Freq" = n())
This should get you the result