Home > Enterprise >  R: Creating an "Other" Variable
R: Creating an "Other" Variable

Time:08-27

I am working with the R Programming language.

Suppose I have the following data:

myFun <- function(n = 5000) {
  a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
  paste0(a, sprintf("d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}

name = myFun(400)

variable = as.integer(abs(rnorm(400, 500,100)))

my_data = data.frame(name,variable)

I want to keep the top 5 rows (based on the value of "variable") and group (sum) everything else as "other". I thought of the following way to do this:

my_data <- my_data [order(-variable),]

my_data_top_5 = my_data[1:5,]

my_data_remainder = my_data[6:nrow(my_data),]
other_count = sum(my_data_remainder$variable)

other = data.frame( name = "other", variable = other_count)

final_result = rbind(my_data_top_5, other)

I think this worked - but is there a more efficient way to do this?

Thanks!

CodePudding user response:

In tidyverse, arrange the data based on the descending order of 'variable', replace the 'name' from 6th onwards to 'other' and do a group by sum

library(dplyr)
my_data %>%
   arrange(desc(variable)) %>%
  group_by(name = replace(name, 6:n(), "other")) %>% 
   summarise(variable = sum(variable, na.rm = TRUE), .groups = 'drop')

CodePudding user response:

Here is another approach:

library(dplyr)

my_data %>% 
  arrange(-variable) %>% 
  slice(6:n()) %>% 
  summarise(variable = sum(variable)) %>% 
  mutate(name = "other", .before=1) %>% 
  bind_rows(my_data) %>% 
  arrange(-variable) %>% 
  slice(1:6) %>% 
  arrange(variable)
        name variable
1 JRZQF6858X      724
2 DYYVV2422L      734
3 QKQRX2862B      741
4 XBINQ6194M      776
5 DZMGX4300N      796
6      other   195240
  • Related