I have a dataset that I want to summarize. First, I want the sum of the home and away games, which I can do. However, I also want to know how many outliers (defined as more than 300 points) are within each subcategory (home, away).
If I wasn't using summarize, I know dplyr
has the count()
function, but I'd like this solution to appear in my summarize()
call. Here's what I have and what I've tried, which fails to perform:
#Test data
library(dplyr)
test <- tibble(score = c(100, 150, 200, 301, 150, 345, 102, 131),
location = c("home", "away", "home", "away", "home", "away", "home", "away"),
more_than_300 = c(FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))
#attempt 1, count rows that match a criteria
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = nrow(.[more_than_300 == FALSE]))
CodePudding user response:
You can use sum
on logical vectors - it will automatically convert them into numeric values (TRUE
being equal to 1 and FALSE
being equal to 0), so you need only do:
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = sum(more_than_300))
#> # A tibble: 2 x 3
#> location total_score n_outliers
#> <chr> <dbl> <int>
#> 1 away 927 2
#> 2 home 552 0
Or, if these are your only 3 columns, an equivalent would be:
test %>%
group_by(location) %>%
summarize(across(everything(), sum))
In fact, you don't need to make the more_than_300
column - it would suffice to do:
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = sum(score > 300))
CodePudding user response:
In base R, we can try aggregate
like this
> aggregate(.~location,test,sum)
location score more_than_300
1 away 927 2
2 home 552 0
CodePudding user response:
In base xtabs
could be used to sum up per group.
xtabs(cbind(score, more_than_300) ~ ., test)
#location score more_than_300
# away 927 2
# home 552 0
Or by calculating the outliers on the fly and giving desired column names.
xtabs(cbind(total_score = score, n_outliers = score > 300) ~ location, test)
#location total_score n_outliers
# away 927 2
# home 552 0
Another option, also in base, will be rowsum
.
with(test, rowsum(cbind(total_score = score, n_outliers = score > 300), location))
# total_score n_outliers
#away 927 2
#home 552 0
xtabs
and rowsum
are specialized in calculating sums per group and might be performant in this task.