Home > front end >  Filter data with specific pattern on following line in R
Filter data with specific pattern on following line in R

Time:03-25

I have data set that shows below:

ID, Colour
1, Red
1, Yellow
1, Blue
1, Green
1, Pink
1, Red
1, Yellow
1, Blue
1, Red
1, Red
2, Red
2, Yellow
2, Blue
2, Blue
2, Yellow
2, Red
2, Blue
2, Red
2, Red
3, Blue
3, Blue
3, Red
3, Red
3, Yellow
3, Blue
3, Red
3, Yellow
3, Blue

I want to filter the row, which consists of this pattern of Red, Yellow, Blue.

The result should be like this, with the index of each pattern:

ID, Colour, Index
        1, Red, 1
        1, Yellow, 1
        1, Blue, 1
        1, Red, 2
        1, Yellow, 2
        1, Blue, 2
        2, Red, 3
        2, Yellow, 3
        2, Blue, 3
        3, Red, 4
        3, Yellow, 4
        3, Blue, 4
        3, Red, 5
        3, Yellow, 5
        3, Blue, 5

Thank you in advance, hopefully, anyone can help.

CodePudding user response:

We may use

library(dplyr)
library(zoo)
df1 %>%
    group_by(ID) %>%
    slice(sort(unique(rep(which(rollapply(Colour, width = 3,
       FUN = toString) == "Red, Yellow, Blue"), each = 3)   0:2))) %>%
    ungroup %>%
    mutate(Index = cumsum(Colour == "Red"))

-output

# A tibble: 15 × 3
      ID Colour Index
   <int> <chr>  <int>
 1     1 Red        1
 2     1 Yellow     1
 3     1 Blue       1
 4     1 Red        2
 5     1 Yellow     2
 6     1 Blue       2
 7     2 Red        3
 8     2 Yellow     3
 9     2 Blue       3
10     3 Red        4
11     3 Yellow     4
12     3 Blue       4
13     3 Red        5
14     3 Yellow     5
15     3 Blue       5

data

df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L), Colour = c("Red", "Yellow", "Blue", "Green", "Pink", 
"Red", "Yellow", "Blue", "Red", "Red", "Red", "Yellow", "Blue", 
"Blue", "Yellow", "Red", "Blue", "Red", "Red", "Blue", "Blue", 
"Red", "Red", "Yellow", "Blue", "Red", "Yellow", "Blue")), row.names = c(NA, 
-28L), class = "data.frame")

CodePudding user response:

Here is a data.table approach, that uses Reduce and shift to find the IDs that have the pattern

library(data.table)

out = setDT(d)[Reduce(paste, shift(Colour,0:-2))=="Red Yellow Blue"]
out = out[,index:=.I][rep(1:.N,3)][, Colour:=rep(c("Red", "Yellow", "Blue"), length.out=.N)]

       ID Colour index
    <int> <char> <int>
 1:     1    Red     1
 2:     1 Yellow     2
 3:     2   Blue     3
 4:     3    Red     4
 5:     3 Yellow     5
 6:     1   Blue     1
 7:     1    Red     2
 8:     2 Yellow     3
 9:     3   Blue     4
10:     3    Red     5
11:     1 Yellow     1
12:     1   Blue     2
13:     2    Red     3
14:     3 Yellow     4
15:     3   Blue     5

Another good way to do this is create a function, f that creates the output, given some ids

f <- function(id) {
  data.table(
    ID = rep(id,each=3),
    Colour=rep(c("Red", "Yellow","Blue"),times=length(id)),
    index=rep(1:length(id), times=3))
}

Just get the ids, using the above strategy

ids = setDT(d)[Reduce(paste, shift(Colour,0:-2))=="Red Yellow Blue"]$ID

and apply to the function

f(ids)

       ID Colour index
    <int> <char> <int>
 1:     1    Red     1
 2:     1 Yellow     2
 3:     1   Blue     3
 4:     1    Red     4
 5:     1 Yellow     5
 6:     1   Blue     1
 7:     2    Red     2
 8:     2 Yellow     3
 9:     2   Blue     4
10:     3    Red     5
11:     3 Yellow     1
12:     3   Blue     2
13:     3    Red     3
14:     3 Yellow     4
15:     3   Blue     5
  • Related