I have the following dataframe df1
:
company_location count
<chr> <int>
1 DE 28
2 JP 6
3 GB 47
4 HN 1
5 US 355
6 HU 1
I want to get to df2
:
company_location count
<chr> <int>
1 DE 28
2 GB 47
3 US 355
4 OTHER 8
df2
is the same as df1
but sums together all the columns with count<10
and aggregates them in a column called OTHER
Does something like this exist: A group_by() function that only groups all the rows that match a particular condition into one group and leaves all the other rows in groups only containing them alone?
Thanks!
CodePudding user response:
This is what fct_lump_min
is for - it's a function from forcats
, which is part of the tidyverse.
library(tidyverse)
df %>%
group_by(company_location = fct_lump_min(company_location, 10, count)) %>%
summarise(count = sum(count))
#> # A tibble: 4 x 2
#> company_location count
#> <fct> <int>
#> 1 DE 28
#> 2 GB 47
#> 3 US 355
#> 4 Other 8
CodePudding user response:
Make a temporary variable regrouping company_location
based on count
, then summarise
:
library(dplyr)
df1 %>%
group_by(company_location = replace(company_location, count < 10, 'OTHER')) %>%
summarise(count = sum(count))
# company_location count
# <chr> <int>
#1 DE 28
#2 GB 47
#3 OTHER 8
#4 US 355