Home > Blockchain >  How to count distinct non-missing rows in R using dplyr
How to count distinct non-missing rows in R using dplyr

Time:10-18

library(dplyr)
mydat <- data.frame(id = c(123, 111, 234, "none", 123, 384, "none"),
                    id2 = c(1, 1, 1, 2, 2, 3, 4))
> mydat
    id id2
1  123   1
2  111   1
3  234   1
4 none   2
5  123   2
6  384   3
7 none   4

I would like to count the number of unique ids for each id2 in medal. However, for the id that is none, I do not want to count it.

> mydat %>% group_by(id2) %>% summarise(count = n_distinct(id))
# A tibble: 4 × 2
    id2 count
  <dbl> <int>
1     1     3
2     2     2
3     3     1
4     4     1

Using this mistakenly counts none. The desired output should be

> mydat %>% group_by(id2) %>% summarise(count = n_distinct(id))
# A tibble: 4 × 2
    id2 count
  <dbl> <int>
1     1     3
2     2     1
3     3     1
4     4     0

CodePudding user response:

mydat %>% group_by(id2) %>% 
  summarise(
    count = n_distinct(id),
    wanted = n_distinct(id[id != "none"])
  )
# # A tibble: 4 × 3
#     id2 count wanted
#   <dbl> <int>  <int>
# 1     1     3      3
# 2     2     2      1
# 3     3     1      1
# 4     4     1      0
  • Related