Home > Software engineering >  Add column for distinct values and count to same tibble in R
Add column for distinct values and count to same tibble in R

Time:11-21

I have two tibbles that I would like to combine. They are grouped by the same variable, but I would like to see them in the same table together. The first is:

df %>%
    filter(Cancelled == FALSE) %>%
    count(School)

Which gives me the count of "School":

School count
Comm 42
IR 52
Business 34
Nursing 23

The next is:

df%>%
    filter(Cancelled == FALSE) %>%
    group_by(School) %>%
    summarise(n_distinct(ID))

Which gives me the count of unique "ID" values in each "School".:

School unique
Comm 17
IR 18
Business 14
Nursing 12

Basically, what I would like is for the count to be one row and the unique value count to be a second row:

School count unique
Comm 17 42
IR 18 52
Business 14 34
Nursing 12 23

Thank you in advance!

*Edit: better description of original data

dput(data)
structure(list(ID = c(1986, 3707, 2467, 3087, 2155, 3133, 2531, 
3112, 2042, 2912, 1305, 1519, 2411, 3630, 2015, 2943, 2873, 1591, 
3127, 3733, 3492, 3156, 3907, 3877, 2050, 2956, 1280, 3544, 1465, 
1410, 3946, 2868, 2288, 3722, 1611, 3188, 3609, 2847, 1803, 2580, 
1928, 1775, 2774, 1259, 3851, 2135, 3046, 1480, 2480, 2240, 3279, 
3983, 2042, 3754, 1851, 3528, 3161, 2547, 3068, 2739, 3936, 3290, 
2465, 2839, 2139, 2635, 1655, 3903, 2333, 1787, 2913, 2764, 2791, 
1501, 2101, 3312, 3428, 3502, 1826, 3823, 3064, 2705, 1917, 1427, 
1627, 1519, 3811, 3661, 3034, 1977, 2502, 3240, 1575, 2882, 3651, 
2065, 2366, 2016, 2991, 1996), School = c("Nursing", "Business", 
"Comm", "Nursing", "Business", "Nursing", "Nursing", "Nursing", 
"Nursing", "Nursing", "IR", "Comm", "Nursing", "IR", "Nursing", 
"Comm", "Business", "Business", "Business", "Nursing", "Nursing", 
"Nursing", "Comm", "Nursing", "Business", "Nursing", "Comm", 
"Business", "IR", "IR", "Nursing", "Business", "Business", "IR", 
"Business", "Business", "Business", "Comm", "Nursing", "Comm", 
"IR", "Nursing", "Nursing", "Nursing", "Nursing", "Comm", "Nursing", 
"Business", "IR", "Comm", "Comm", "Business", "IR", "Nursing", 
"Nursing", "IR", "Comm", "Business", "IR", "IR", "Nursing", "IR", 
"Nursing", "Nursing", "Nursing", "Business", "Comm", "Nursing", 
"IR", "IR", "Business", "Comm", "IR", "Nursing", "Nursing", "Business", 
"Nursing", "Comm", "Business", "Business", "Nursing", "Nursing", 
"Nursing", "Nursing", "Nursing", "Nursing", "Comm", "Nursing", 
"IR", "Business", "Nursing", "Comm", "Nursing", "Comm", "Nursing", 
"Nursing", "IR", "Business", "Nursing", "Comm")), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

You can do everything in one pipe, but it doesn‘t necessarily look cleaner:

library(tidyverse)
data %>%
  count(School, name = 'count') %>%
  left_join(., data %>%
                 group_by(School) %>%
                 summarize(unique = n_distinct(ID)),
            by = 'School')

which gives with your example data:

# A tibble: 4 x 3
  School   count unique
  <chr>    <int>  <int>
1 Business    22     22
2 Comm        18     18
3 IR          17     17
4 Nursing     43     43

I guess/assume it‘s just coincidence of your example data that there are no duplicate IDs per school and hence the count and unique value are the same.

CodePudding user response:

We could use left_join:

library(dplyr)
left_join(df, df1, by="School")
    School count unique
1     Comm    42     17
2       IR    52     18
3 Business    34     14
4  Nursing    23     12
  • Related