I have a dataset that records the products associated with certain accounts. I want to summarise the total number of accounts for a specific set of products, only counting each account number once, no matter how many products they have. So the total for this sample would be 4. (a b c d)
Account | Product |
---|---|
a | 1 |
a | 2 |
b | 1 |
c | 1 |
c | 2 |
d | 3 |
The code I have tried so far is
filter(Product == 1 | 2 | 3) %>%
summarise(total = n_distinct(), .groups = Account)
This gives message Error in summarise_verbose(.groups, caller_env()) : object 'Account' not found
I also tried
filter(Product == 1 | 2 | 3) %>%
summarise(total = n_distinct(Account))
But this doesn't reduce the number of rows properly - I'm still getting 300,000 rows when I should get 70,000 based on other data I have. Is there a way of counting the (alphanumeric) account numbers once and once only, no matter what the products are?
CodePudding user response:
In the absence of an example of minimal data, I suppose you want to count the different elements by groups and using filters.
Data %>%
filter(Product %in% c(1,2,3)) %>%
group_by(Account) %>%
summarise(
total = n_distinct(Product)
)
CodePudding user response:
You were close
df %>%
group_by(Account) %>%
summarise(
total = n_distinct(Product)
)