I have a data frame like this:
ID tr tri
a A 11
a A 11
a A 11
a A 11
a A 12
a A 12
a A 12
a A 13
a A 13
a A 13
a A 13
f A 11
f A 11
f A 12
f A 12
f A 12
f A 13
f A 13
f A 14
f A 14
f A 14
f A 14
j B 11
j B 11
j B 12
j B 12
j B 12
j B 12
j B 13
j B 14
j B 14
j B 14
j B 14
p B 11
p B 11
p B 11
p B 11
p B 11
p B 12
p B 12
p B 13
p B 13
p B 13
w B 11
w B 11
w B 12
w B 12
w B 12
w B 12
w B 12
w B 13
w B 13
w B 14
I need to know how many unique ID are there in each tri for each tr, like this:
tr tri count
A 11 2
A 12 2
A 13 2
A 14 1
B 11 3
B 12 3
B 13 3
B 14 2
So my question is how to count one column based on two other columns? I think I need the summarize
function, but I don't know how to do it based on both of my columns.
Thanks.
CodePudding user response:
You need to group tr
and tri
together, then use summarize
and n_distinct
to get count the unique ID
.
library(dplyr)
df %>% group_by(tr, tri) %>% summarize(count = n_distinct(ID), .groups = "drop")
# A tibble: 8 × 3
tr tri count
<chr> <int> <int>
1 A 11 2
2 A 12 2
3 A 13 2
4 A 14 1
5 B 11 3
6 B 12 3
7 B 13 3
8 B 14 2