Home > Back-end >  How to count one column based on two other columns in R?
How to count one column based on two other columns in R?

Time:04-16

I have a data frame like this:

ID  tr  tri
a   A   11
a   A   11
a   A   11
a   A   11
a   A   12
a   A   12
a   A   12
a   A   13
a   A   13
a   A   13
a   A   13
f   A   11
f   A   11
f   A   12
f   A   12
f   A   12
f   A   13
f   A   13
f   A   14
f   A   14
f   A   14
f   A   14
j   B   11
j   B   11
j   B   12
j   B   12
j   B   12
j   B   12
j   B   13
j   B   14
j   B   14
j   B   14
j   B   14
p   B   11
p   B   11
p   B   11
p   B   11
p   B   11
p   B   12
p   B   12
p   B   13
p   B   13
p   B   13
w   B   11
w   B   11
w   B   12
w   B   12
w   B   12
w   B   12
w   B   12
w   B   13
w   B   13
w   B   14

I need to know how many unique ID are there in each tri for each tr, like this:

tr  tri count
A   11  2
A   12  2
A   13  2
A   14  1
B   11  3
B   12  3
B   13  3
B   14  2

So my question is how to count one column based on two other columns? I think I need the summarize function, but I don't know how to do it based on both of my columns. Thanks.

CodePudding user response:

You need to group tr and tri together, then use summarize and n_distinct to get count the unique ID.

library(dplyr)

df %>% group_by(tr, tri) %>% summarize(count = n_distinct(ID), .groups = "drop")

# A tibble: 8 × 3
  tr      tri count
  <chr> <int> <int>
1 A        11     2
2 A        12     2
3 A        13     2
4 A        14     1
5 B        11     3
6 B        12     3
7 B        13     3
8 B        14     2
  • Related