Home > Back-end >  Select rows within an overlapping range based on another column in R tidyverse
Select rows within an overlapping range based on another column in R tidyverse

Time:07-28

I have a data frame that looks like this:
the col1 defines the start of a range when the direction is " " while the col2 establishes the beginning of a range when the direction is " - ".

library(tidyverse)
df <- tibble(col1=c(1,10,100,40,1000), col2=c(15,20,50,80,2000), 
             direction=c(" "," ","-"," "," "), score=c(50,100,300,10,300))
df 
#> # A tibble: 5 × 4
#>    col1  col2 direction score
#>   <dbl> <dbl> <chr>     <dbl>
#> 1     1    15              50
#> 2    10    20             100
#> 3   100    50 -           300
#> 4    40    80              10
#> 5  1000  2000             300

Created on 2022-07-28 by the reprex package (v2.0.1)

By considering the direction, I want to extract from the rows with overlapping ranges the ones with the highest score.

I want my data to look like this.


#>    col1  col2 direction score
#>   <dbl> <dbl> <chr>     <dbl>
#> 1    10    20             100
#> 3   100    50 -           300
#> 5  1000  2000             300

Any ideas and help are highly appreciated.

CodePudding user response:

We could use slice_max after grouping by rleid on the 'direction'

library(dplyr)
library(data.table)
df %>% 
  group_by(grp = rleid(direction)) %>%
  slice_max(n = 1, order_by = score) %>%
  ungroup %>%
  select(-grp)

-output

# A tibble: 3 × 4
   col1  col2 direction score
  <dbl> <dbl> <chr>     <dbl>
1    10    20             100
2   100    50 -           300
3  1000  2000             300
  • Related