R: Creating an "Other" Variable-CodePudding

I am working with the R Programming language.

Suppose I have the following data:

myFun <- function(n = 5000) {
  a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
  paste0(a, sprintf("d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}

name = myFun(400)

variable = as.integer(abs(rnorm(400, 500,100)))

my_data = data.frame(name,variable)

I want to keep the top 5 rows (based on the value of "variable") and group (sum) everything else as "other". I thought of the following way to do this:

my_data <- my_data [order(-variable),]

my_data_top_5 = my_data[1:5,]

my_data_remainder = my_data[6:nrow(my_data),]
other_count = sum(my_data_remainder$variable)

other = data.frame( name = "other", variable = other_count)

final_result = rbind(my_data_top_5, other)

I think this worked - but is there a more efficient way to do this?

Thanks!

CodePudding user response：

In tidyverse, arrange the data based on the descending order of 'variable', replace the 'name' from 6th onwards to 'other' and do a group by sum

library(dplyr)
my_data %>%
   arrange(desc(variable)) %>%
  group_by(name = replace(name, 6:n(), "other")) %>% 
   summarise(variable = sum(variable, na.rm = TRUE), .groups = 'drop')

CodePudding user response：

Here is another approach:

library(dplyr)

my_data %>% 
  arrange(-variable) %>% 
  slice(6:n()) %>% 
  summarise(variable = sum(variable)) %>% 
  mutate(name = "other", .before=1) %>% 
  bind_rows(my_data) %>% 
  arrange(-variable) %>% 
  slice(1:6) %>% 
  arrange(variable)

        name variable
1 JRZQF6858X      724
2 DYYVV2422L      734
3 QKQRX2862B      741
4 XBINQ6194M      776
5 DZMGX4300N      796
6      other   195240