I have a dataframe from which I want to filter/which I want to subset on row sequences fulfilling two conditions: (EDIT) (i) the first row in the sequence is Q == "q_misc"
and (ii) all rows in the sequence Seq
are not NA
df <- structure(list(Line = c(480L, 481L, 482L, 483L, 484L, 485L, 497L,
498L, 499L, 500L, 501L, 502L, 549L, 550L, 551L, 552L, 557L, 558L,
559L, 560L, 561L, 562L, 563L, 564L), Seq = c(NA, 7L, 7L, 7L,
NA, NA, NA, 0L, 0L, 0L, 0L, NA, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L,
0L, 0L, NA, NA), Q = c(NA, "q_wh", NA, NA, NA, NA, NA, "q_misc",
NA, NA, NA, NA, "q_pol", NA, NA, NA, NA, "q_misc", NA, NA, NA,
NA, NA, NA)), row.names = c(NA, -24L), class = "data.frame")
The desired output is this row sequence:
Line Seq Q
8 498 0 q_misc
9 499 0 <NA>
10 500 0 <NA>
11 501 0 <NA>
18 558 0 q_misc
19 559 0 <NA>
20 560 0 <NA>
21 561 0 <NA>
22 562 0 <NA>
I've tried this but it only returns the first row of the row sequence:
library(dplyr)
df %>%
filter(Q == "q_misc" & !is.na(Seq))
CodePudding user response:
You may use fill
to fill the missing values with previous value and select rows where value is 'q_misc'
and Seq
is not NA
.
library(dplyr)
library(tidyr)
df %>%
mutate(Q1 = Q) %>%
fill(Q1) %>%
filter(Q1 == 'q_misc' & !is.na(Seq)) %>%
select(-Q1)
# Line Seq Q
#1 498 0 q_misc
#2 499 0 <NA>
#3 500 0 <NA>
#4 501 0 <NA>
#5 558 0 q_misc
#6 559 0 <NA>
#7 560 0 <NA>
#8 561 0 <NA>
#9 562 0 <NA>
CodePudding user response:
Using na.locf
from zoo
library(zoo)
library(dplyr)
df %>%
filter(zoo::na.locf0(Q) %in% 'q_misc', complete.cases(Seq))
Line Seq Q
1 498 0 q_misc
2 499 0 <NA>
3 500 0 <NA>
4 501 0 <NA>
5 558 0 q_misc
6 559 0 <NA>
7 560 0 <NA>
8 561 0 <NA>
9 562 0 <NA>