Home > front end >  Filter row sequence defined by values in several columns
Filter row sequence defined by values in several columns

Time:09-24

I have a dataframe from which I want to filter/which I want to subset on row sequences fulfilling two conditions: (EDIT) (i) the first row in the sequence is Q == "q_misc" and (ii) all rows in the sequence Seq are not NA

df <- structure(list(Line = c(480L, 481L, 482L, 483L, 484L, 485L, 497L, 
498L, 499L, 500L, 501L, 502L, 549L, 550L, 551L, 552L, 557L, 558L, 
559L, 560L, 561L, 562L, 563L, 564L), Seq = c(NA, 7L, 7L, 7L, 
NA, NA, NA, 0L, 0L, 0L, 0L, NA, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 
0L, 0L, NA, NA), Q = c(NA, "q_wh", NA, NA, NA, NA, NA, "q_misc", 
NA, NA, NA, NA, "q_pol", NA, NA, NA, NA, "q_misc", NA, NA, NA, 
NA, NA, NA)), row.names = c(NA, -24L), class = "data.frame")

The desired output is this row sequence:

   Line Seq      Q
8   498   0 q_misc
9   499   0   <NA>
10  500   0   <NA>
11  501   0   <NA>
18  558   0 q_misc
19  559   0   <NA>
20  560   0   <NA>
21  561   0   <NA>
22  562   0   <NA>

I've tried this but it only returns the first row of the row sequence:

library(dplyr)
df %>%
  filter(Q == "q_misc" & !is.na(Seq))

CodePudding user response:

You may use fill to fill the missing values with previous value and select rows where value is 'q_misc' and Seq is not NA.

library(dplyr)
library(tidyr)

df %>%
  mutate(Q1 = Q) %>%
  fill(Q1) %>%
  filter(Q1 == 'q_misc' & !is.na(Seq)) %>%
  select(-Q1)

#  Line Seq      Q
#1  498   0 q_misc
#2  499   0   <NA>
#3  500   0   <NA>
#4  501   0   <NA>
#5  558   0 q_misc
#6  559   0   <NA>
#7  560   0   <NA>
#8  561   0   <NA>
#9  562   0   <NA>

CodePudding user response:

Using na.locf from zoo

library(zoo)
library(dplyr)
df %>% 
   filter(zoo::na.locf0(Q) %in% 'q_misc', complete.cases(Seq))
  Line Seq      Q
1  498   0 q_misc
2  499   0   <NA>
3  500   0   <NA>
4  501   0   <NA>
5  558   0 q_misc
6  559   0   <NA>
7  560   0   <NA>
8  561   0   <NA>
9  562   0   <NA>
  • Related