Proper preprocessing in R-CodePudding

I have a dataset like this

FID	osmid	s	e	seg_length
0	4999	733	99	7.7
1	566	733	33	3.2
2	499	713	96	7.7
3	56	783	32	3.5
4	409	783	98	7.6
5	516	736	38	3.5
6	459	739	98	7.7
7	526	731	33	3.2

s stands for starting work and e for ending point. Some FID share the same start and end point. I want to keep only one start and end point for every point so when two starting point are shared I want to keep only these with the shortest seg_length. I could'nt find a good code for that. Every end and start point should only have one unique value.

For example FID 0 and 1 share the same starting point and in the new dataset only FID 1 should be there. Also FID 4 and FID 6 share the same end point and in the new dataset only 4 should be there.

CodePudding user response：

We may use slice_min after grouping by 's'

library(dplyr)
df1 %>% 
   group_by(s) %>%
   slice_min(n = 1, order_by = seg_length) %>%
   ungroup

-output

# A tibble: 6 × 5
    FID osmid     s     e seg_length
  <int> <int> <int> <int>      <dbl>
1     2   499   713    96        7.7
2     7   526   731    33        3.2
3     1   566   733    33        3.2
4     5   516   736    38        3.5
5     6   459   739    98        7.7
6     3    56   783    32        3.5

data

df1 <- structure(list(FID = 0:7, osmid = c(4999L, 566L, 499L, 56L, 409L, 
516L, 459L, 526L), s = c(733L, 733L, 713L, 783L, 783L, 736L, 
739L, 731L), e = c(99L, 33L, 96L, 32L, 98L, 38L, 98L, 33L), seg_length = c(7.7, 
3.2, 7.7, 3.5, 7.7, 3.5, 7.7, 3.2)), 
class = "data.frame", row.names = c(NA, 
-8L))

CodePudding user response：

We could group and then filter by min(seg_length):

library(dplyr)

df1 %>% 
  group_by(s) %>% 
  filter(seg_length == min(seg_length)) %>% 
  ungroup()

   FID osmid     s     e seg_length
  <int> <int> <int> <int>      <dbl>
1     1   566   733    33        3.2
2     2   499   713    96        7.7
3     3    56   783    32        3.5
4     5   516   736    38        3.5
5     6   459   739    98        7.7
6     7   526   731    33        3.2