Let's say I have a dataframe that looks like:
owner unit_id detail
abc123 002NH034 94847DT
abc123 002NH034 94868DT
abc123 002NH034 94889DT
abc123 112NH035 94899DT
abc123 112NH036
abc123 112NH037
I'm trying to roll up to the level of the first column, while counting the number of distinct values in the second column that have a value in the third column. So expected output would be:
abc123 2
I've tried a mix of sum
, is.na
, and a few other dplyr functions but am a little stuck. I've tried a variation of the following with random other functions but keep getting sums of all the rows.
df %>% group_by(owner) %>% summarise(unit_ids_with_detail = sum(!is.na(detail))
CodePudding user response:
This should do:
df %>% filter(!is.na(detail)) %>% group_by(owner) %>%
summarise(unit_ids_with_detail = n_distinct(unit_id))
Remove NAs from third column, group by first column, count distinct values on second column. This will drop owners that have only NAs on the third column, though
CodePudding user response:
If we want the distinct elements, use n_distinct
- Grouped by 'owner', use n_distinct
on the 'unit_id' and specify na.rm = TRUE
to remove the NA
elements
library(dplyr)
df %>%
group_by(owner) %>%
summarise(unit_ids_with_detail = n_distinct(detail, na.rm = TRUE))