Home > Software engineering >  Proper preprocessing in R
Proper preprocessing in R

Time:06-14

I have a dataset like this

FID osmid s e seg_length
0 4999 733 99 7.7
1 566 733 33 3.2
2 499 713 96 7.7
3 56 783 32 3.5
4 409 783 98 7.6
5 516 736 38 3.5
6 459 739 98 7.7
7 526 731 33 3.2

s stands for starting work and e for ending point. Some FID share the same start and end point. I want to keep only one start and end point for every point so when two starting point are shared I want to keep only these with the shortest seg_length. I could'nt find a good code for that. Every end and start point should only have one unique value.

For example FID 0 and 1 share the same starting point and in the new dataset only FID 1 should be there. Also FID 4 and FID 6 share the same end point and in the new dataset only 4 should be there.

CodePudding user response:

We may use slice_min after grouping by 's'

library(dplyr)
df1 %>% 
   group_by(s) %>%
   slice_min(n = 1, order_by = seg_length) %>%
   ungroup

-output

# A tibble: 6 × 5
    FID osmid     s     e seg_length
  <int> <int> <int> <int>      <dbl>
1     2   499   713    96        7.7
2     7   526   731    33        3.2
3     1   566   733    33        3.2
4     5   516   736    38        3.5
5     6   459   739    98        7.7
6     3    56   783    32        3.5

data

df1 <- structure(list(FID = 0:7, osmid = c(4999L, 566L, 499L, 56L, 409L, 
516L, 459L, 526L), s = c(733L, 733L, 713L, 783L, 783L, 736L, 
739L, 731L), e = c(99L, 33L, 96L, 32L, 98L, 38L, 98L, 33L), seg_length = c(7.7, 
3.2, 7.7, 3.5, 7.7, 3.5, 7.7, 3.2)), 
class = "data.frame", row.names = c(NA, 
-8L))

CodePudding user response:

We could group and then filter by min(seg_length):

library(dplyr)

df1 %>% 
  group_by(s) %>% 
  filter(seg_length == min(seg_length)) %>% 
  ungroup()
   FID osmid     s     e seg_length
  <int> <int> <int> <int>      <dbl>
1     1   566   733    33        3.2
2     2   499   713    96        7.7
3     3    56   783    32        3.5
4     5   516   736    38        3.5
5     6   459   739    98        7.7
6     7   526   731    33        3.2
  •  Tags:  
  • r
  • Related