Group all rows that meet a certain condition-CodePudding

I have the following dataframe df1:

  company_location count
  <chr>            <int>
1 DE                  28
2 JP                   6
3 GB                  47
4 HN                   1
5 US                 355
6 HU                   1

I want to get to df2:

  company_location count
  <chr>            <int>
1 DE                  28
2 GB                  47
3 US                 355
4 OTHER                8

df2 is the same as df1 but sums together all the columns with count<10 and aggregates them in a column called OTHER

Does something like this exist: A group_by() function that only groups all the rows that match a particular condition into one group and leaves all the other rows in groups only containing them alone?

Thanks!

CodePudding user response：

This is what fct_lump_min is for - it's a function from forcats, which is part of the tidyverse.

library(tidyverse)

df %>%
  group_by(company_location = fct_lump_min(company_location, 10, count)) %>%
  summarise(count = sum(count))

#> # A tibble: 4 x 2
#>   company_location count
#>   <fct>            <int>
#> 1 DE                  28
#> 2 GB                  47
#> 3 US                 355
#> 4 Other                8

CodePudding user response：

Make a temporary variable regrouping company_location based on count, then summarise:

library(dplyr)
df1 %>% 
    group_by(company_location = replace(company_location, count < 10, 'OTHER')) %>% 
    summarise(count = sum(count))

#  company_location count
#  <chr>            <int>
#1 DE                  28
#2 GB                  47
#3 OTHER                8
#4 US                 355