Add column for distinct values and count to same tibble in R-CodePudding

I have two tibbles that I would like to combine. They are grouped by the same variable, but I would like to see them in the same table together. The first is:

df %>%
    filter(Cancelled == FALSE) %>%
    count(School)

Which gives me the count of "School":

School	count
Comm	42
IR	52
Business	34
Nursing	23

The next is:

df%>%
    filter(Cancelled == FALSE) %>%
    group_by(School) %>%
    summarise(n_distinct(ID))

Which gives me the count of unique "ID" values in each "School".:

School	unique
Comm	17
IR	18
Business	14
Nursing	12

Basically, what I would like is for the count to be one row and the unique value count to be a second row:

School	count	unique
Comm	17	42
IR	18	52
Business	14	34
Nursing	12	23

Thank you in advance!

*Edit: better description of original data

dput(data)
structure(list(ID = c(1986, 3707, 2467, 3087, 2155, 3133, 2531, 
3112, 2042, 2912, 1305, 1519, 2411, 3630, 2015, 2943, 2873, 1591, 
3127, 3733, 3492, 3156, 3907, 3877, 2050, 2956, 1280, 3544, 1465, 
1410, 3946, 2868, 2288, 3722, 1611, 3188, 3609, 2847, 1803, 2580, 
1928, 1775, 2774, 1259, 3851, 2135, 3046, 1480, 2480, 2240, 3279, 
3983, 2042, 3754, 1851, 3528, 3161, 2547, 3068, 2739, 3936, 3290, 
2465, 2839, 2139, 2635, 1655, 3903, 2333, 1787, 2913, 2764, 2791, 
1501, 2101, 3312, 3428, 3502, 1826, 3823, 3064, 2705, 1917, 1427, 
1627, 1519, 3811, 3661, 3034, 1977, 2502, 3240, 1575, 2882, 3651, 
2065, 2366, 2016, 2991, 1996), School = c("Nursing", "Business", 
"Comm", "Nursing", "Business", "Nursing", "Nursing", "Nursing", 
"Nursing", "Nursing", "IR", "Comm", "Nursing", "IR", "Nursing", 
"Comm", "Business", "Business", "Business", "Nursing", "Nursing", 
"Nursing", "Comm", "Nursing", "Business", "Nursing", "Comm", 
"Business", "IR", "IR", "Nursing", "Business", "Business", "IR", 
"Business", "Business", "Business", "Comm", "Nursing", "Comm", 
"IR", "Nursing", "Nursing", "Nursing", "Nursing", "Comm", "Nursing", 
"Business", "IR", "Comm", "Comm", "Business", "IR", "Nursing", 
"Nursing", "IR", "Comm", "Business", "IR", "IR", "Nursing", "IR", 
"Nursing", "Nursing", "Nursing", "Business", "Comm", "Nursing", 
"IR", "IR", "Business", "Comm", "IR", "Nursing", "Nursing", "Business", 
"Nursing", "Comm", "Business", "Business", "Nursing", "Nursing", 
"Nursing", "Nursing", "Nursing", "Nursing", "Comm", "Nursing", 
"IR", "Business", "Nursing", "Comm", "Nursing", "Comm", "Nursing", 
"Nursing", "IR", "Business", "Nursing", "Comm")), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response：

You can do everything in one pipe, but it doesn‘t necessarily look cleaner:

library(tidyverse)
data %>%
  count(School, name = 'count') %>%
  left_join(., data %>%
                 group_by(School) %>%
                 summarize(unique = n_distinct(ID)),
            by = 'School')

which gives with your example data:

# A tibble: 4 x 3
  School   count unique
  <chr>    <int>  <int>
1 Business    22     22
2 Comm        18     18
3 IR          17     17
4 Nursing     43     43

I guess/assume it‘s just coincidence of your example data that there are no duplicate IDs per school and hence the count and unique value are the same.

CodePudding user response：

We could use left_join:

library(dplyr)
left_join(df, df1, by="School")

    School count unique
1     Comm    42     17
2       IR    52     18
3 Business    34     14
4  Nursing    23     12