Home > Blockchain >  In R, how to collapse the tail end of an ordered dataframe into "n or more"?
In R, how to collapse the tail end of an ordered dataframe into "n or more"?

Time:08-04

Background

I've got a dataframe d:

d <- data.frame(count_events = c(0,1,2,3),
                count_people = c(123,56,2,8),
                stringsAsFactors=FALSE)  

> d
  count_events count_people
1            0          123
2            1           56
3            2            2
4            3            8

The problem

I'd like a way to collapse rows 3 and 4 of d into one row that says something like "2 or more" in count_events and sums 2 and 8 into 10 for count_people. So in other words something like this:

> d
  count_events count_people
1            0          123
2            1           56
3          >=2            10

What I've tried

I can do what I want with rbind, like so:

d <- d[-c(3:4),]
new_row <- c('2 or more', 10)
d <- rbind(d, new_row)

> d
  count_events count_people
1            0          123
2            1           56
3    2 or more           10

But I'm wondering if there's something faster or more elegant. The "real" dataframe I'm trying to do this on has many rows, and many rows I'd like to collapse. It's doable, but it'll take me a decent chunk of time and I thought to myself "is there a dedicated function for something like this?". Thanks.

CodePudding user response:

You could do:

library(dplyr, warn = FALSE)

d |> 
  mutate(count_events = ifelse(row_number() >= 3, "2 or more", count_events)) |> 
  count(count_events, wt = count_people)
#>   count_events   n
#> 1            0 123
#> 2            1  56
#> 3    2 or more  10

CodePudding user response:

data.table equivalent to stefan's answer

library(data.table)
setDT(d)[, .(count_people = sum(count_people)),
  keyby = .(count_events =
    fifelse(count_events >= 2, "2 or more", as.character(count_events)))]

# count_events count_people
# 1:            0          123
# 2:            1           56
# 3:    2 or more           10
  • Related