I have a data frame that looks like this:
the col1 defines the start of a range when the direction is " " while the col2 establishes the beginning of a range when the direction is " - ".
library(tidyverse)
df <- tibble(col1=c(1,10,100,40,1000), col2=c(15,20,50,80,2000),
direction=c(" "," ","-"," "," "), score=c(50,100,300,10,300))
df
#> # A tibble: 5 × 4
#> col1 col2 direction score
#> <dbl> <dbl> <chr> <dbl>
#> 1 1 15 50
#> 2 10 20 100
#> 3 100 50 - 300
#> 4 40 80 10
#> 5 1000 2000 300
Created on 2022-07-28 by the reprex package (v2.0.1)
By considering the direction, I want to extract from the rows with overlapping ranges the ones with the highest score.
I want my data to look like this.
#> col1 col2 direction score
#> <dbl> <dbl> <chr> <dbl>
#> 1 10 20 100
#> 3 100 50 - 300
#> 5 1000 2000 300
Any ideas and help are highly appreciated.
CodePudding user response:
We could use slice_max
after grouping by rleid
on the 'direction'
library(dplyr)
library(data.table)
df %>%
group_by(grp = rleid(direction)) %>%
slice_max(n = 1, order_by = score) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 3 × 4
col1 col2 direction score
<dbl> <dbl> <chr> <dbl>
1 10 20 100
2 100 50 - 300
3 1000 2000 300