Home > Back-end >  Extract sequence of rows in R
Extract sequence of rows in R

Time:07-25

I have this type of data:

df <- structure(list(Utterance = c("(5.127)", ">like I don't understand< sorry like how old's your mom¿", 
                                   "(0.855)", "eh six:ty:::-one=", "(0.101)", "(0.487)", "[((v: gasps)) she said] ~no you're [not?]~", 
                                   "[((v: gasps)) she said] ~no you're [not?]~", "~<[NO YOU'RE] NOT (.) you can't go !in!>~", 
                                   "(0.260)", "show her [your boobs] next time"), 
                     Q = c(NA, "q_wh", "", "", NA, NA, "q_really", "", "", NA, NA), 
                     Sequ = c(NA, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, NA, NA)), class = "data.frame", row.names = c(NA, -11L))

I would like to extract/filter

  • those rows where Sequ is not NA and
  • the immediately preceding row (where Sequ is NA)

My attempt so far is to define a function that gets the indices of the relevant rows:

QA_sequ <- function(value) {
  inds <- which(!is.na(value) & lag(is.na(value)))  
  sort(unique(c(inds-1, inds)))
}

and then to slice out the rows via the indices:

library(dplyr)
df %>% 
  slice(QA_sequ(Sequ))
                                                 Utterance        Q Sequ
1                                                  (5.127)     <NA>   NA
2 >like I don't understand< sorry like how old's your mom¿     q_wh    1
3                                                  (0.487)     <NA>   NA
4               [((v: gasps)) she said] ~no you're [not?]~ q_really    0

However, only the immediately preceding row and first Sequ row are filtered. The result I want to obtain is this:

                                                  Utterance        Q Sequ
1                                                   (5.127)     <NA>   NA
2  >like I don't understand< sorry like how old's your mom¿     q_wh    1
3                                                   (0.855)             1
4                                         eh six:ty:::-one=             1
5                                                   (0.487)     <NA>   NA
6                [((v: gasps)) she said] ~no you're [not?]~ q_really    0
7                [((v: gasps)) she said] ~no you're [not?]~             0
8                 ~<[NO YOU'RE] NOT (.) you can't go !in!>~             0

EDIT:

The solution I've come up with feels cumbersome:

QA_sequ <- function(value) {
  inds <- which(!is.na(value) & lag(is.na(value)))  
  sort(unique(c(inds-1)))    # extract only preceding row!
}

library(dplyr)
df %>% 
  mutate(id = row_number()) %>%
  slice(QA_sequ(Sequ)) %>%
  bind_rows(., df %>% mutate(id = row_number()) %>% filter(!is.na(Sequ))) %>%
  arrange(id)

CodePudding user response:

How about this?

df %>%
  filter(!is.na(Sequ) | lead(!is.na(Sequ), default=FALSE))
#                                                  Utterance        Q Sequ
# 1                                                  (5.127)     <NA>   NA
# 2 >like I don't understand< sorry like how old's your mom¿     q_wh    1
# 3                                                  (0.855)             1
# 4                                        eh six:ty:::-one=             1
# 5                                                  (0.487)     <NA>   NA
# 6               [((v: gasps)) she said] ~no you're [not?]~ q_really    0
# 7               [((v: gasps)) she said] ~no you're [not?]~             0
# 8                ~<[NO YOU'RE] NOT (.) you can't go !in!>~             0

The logic filters (extracts) both of:

  • all non-NA values
  • any NA value where the next value is not NA

CodePudding user response:

Just add an additional OR to collect the rows where sequ is not NA but which don't have a corresponding lagged non-NA...

QA_sequ <- function(value) {
  inds <- which((!is.na(value) & lag(is.na(value))) | !is.na(value))  
  sort(unique(c(inds-1, inds)))
}

df %>%  slice(QA_sequ(Sequ))
                                                 Utterance        Q Sequ
1                                                  (5.127)     <NA>   NA
2 >like I don't understand< sorry like how old's your mom¿     q_wh    1
3                                                  (0.855)             1
4                                        eh six:ty:::-one=             1
5                                                  (0.487)     <NA>   NA
6               [((v: gasps)) she said] ~no you're [not?]~ q_really    0
7               [((v: gasps)) she said] ~no you're [not?]~             0
8                ~<[NO YOU'RE] NOT (.) you can't go !in!>~             0

CodePudding user response:

This is using base R. Just as you were doing take the indices of rows then the preceding.

x<-which(!is.na((df$Sequ))) 
x1 <- x-1
x<- unique(c(x,x1))
x<- x[order(x)]    


df[x,]

You can pass the same vector into slice to df %>% slice(x)

  • Related