I have a table with ID, Category and amount with a few thousand records.
data:
df1 <- data.frame(
ID = c('V1', 'V1', 'V1', 'V3', 'V3', 'V3', 'V4', 'V5','V5','V5'),
Category = c('a', 'a', 'a', 'a', 'b', 'b', 'a', 'b', 'c', 'c'),
Amount = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1))
Using dplyr I want to group by ID and Category, sum the total amount per group, then filter the results to only have IDs which exist in multiple category.
result:
ID Category Amount_Sum
V3 a 1
V3 b 2
V5 b 1
V5 c 2
I have the following code which groups and sums, but missing how to filter when the ID is in multiple groups
code:
x <- df1 %>%
group_by(ID, Category) %>%
summarize(CNT = n(), amount = sum(Amount)) %>%
filter(????????)
CodePudding user response:
Using n_distinct
on the Category
should give you your desired result:
library(dplyr)
df1 %>%
group_by(ID, Category) %>%
summarize(CNT = n(), amount = sum(Amount)) %>%
filter(n_distinct(Category) > 1) %>%
ungroup()
returns
# A tibble: 4 x 4
ID Category CNT amount
<chr> <chr> <int> <dbl>
1 V3 a 1 1
2 V3 b 2 2
3 V5 b 1 1
4 V5 c 2 2
CodePudding user response:
You can also use a combination of length
and unique
to filter
as well:
library(dplyr)
df1 %>%
group_by(ID, Category) %>%
summarize(CNT = n(), amount = sum(Amount)) %>%
filter(length(unique(Category)) > 1)
Output
ID Category CNT amount
<chr> <chr> <int> <dbl>
1 V3 a 1 1
2 V3 b 2 2
3 V5 b 1 1
4 V5 c 2 2
Or here is a base R option using aggregate
to do the summary, then using ave
to do the filtering.
df1_output <-
setNames(do.call(
data.frame,
aggregate(
Amount ~ ID Category,
data = df1,
FUN = function(x)
c(CNT = length(x), amount = sum(x))
)
), c(names(df1[1:2]), "CNT", "amount"))
df1_output[with(df1_output, ave(Category, ID, FUN = function(x) length(unique(x))) > 1),]