I'm trying to subset a data frame to all rows for which a certain column value is '13', but I want all rows preceding a row with '13' to be included too, no matter the value in that column.
I do not want a row to be included twice when it both precedes a row with '13' in the specific column, but also has the value '13' itself.
Here is an example data frame and solution, whereby the condition (subset rows to rows with time = 13 and (time=13)-1, without duplicating)
ID speed dist time
A 4 12 4
B 7 10 8
C 7 18 13
C 8 4 5
A 5 6 13
D 6 2 13
E 7 2 9
Becomes
ID speed dist time
B 7 10 8
C 7 18 13
C 8 4 5
A 5 6 13
D 6 2 13
CodePudding user response:
Create a position index where 'time' value is 13 using which
and then subtract 1 from the index and concatenate both to subset
i1 <- which(df1$time == 13)
ind <- sort(unique(i1 - rep(c(1, 0), each = length(i1))))
ind <- ind[ind >0]
df1[ind,]
-output
ID speed dist time
2 B 7 10 8
3 C 7 18 13
4 C 8 4 5
5 A 5 6 13
6 D 6 2 13
data
df1 <- structure(list(ID = c("A", "B", "C", "C", "A", "D", "E"), speed = c(4L,
7L, 7L, 8L, 5L, 6L, 7L), dist = c(12L, 10L, 18L, 4L, 6L, 2L,
2L), time = c(4L, 8L, 13L, 5L, 13L, 13L, 9L)),
class = "data.frame", row.names = c(NA,
-7L))
CodePudding user response:
df <- read.table(text = '
ID speed dist time
A 4 12 4
B 7 10 8
C 7 18 13
C 8 4 5
A 5 6 13
D 6 2 13
E 7 2 9
', header = TRUE)
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2
df %>%
filter(time == 13 | lead(time == 13))
#> ID speed dist time
#> 1 B 7 10 8
#> 2 C 7 18 13
#> 3 C 8 4 5
#> 4 A 5 6 13
#> 5 D 6 2 13
Created on 2022-07-19 by the reprex package (v2.0.1)