I have data set that shows below:
ID, Colour
1, Red
1, Yellow
1, Blue
1, Green
1, Pink
1, Red
1, Yellow
1, Blue
1, Red
1, Red
2, Red
2, Yellow
2, Blue
2, Blue
2, Yellow
2, Red
2, Blue
2, Red
2, Red
3, Blue
3, Blue
3, Red
3, Red
3, Yellow
3, Blue
3, Red
3, Yellow
3, Blue
I want to filter the row, which consists of this pattern of Red, Yellow, Blue.
The result should be like this, with the index of each pattern:
ID, Colour, Index
1, Red, 1
1, Yellow, 1
1, Blue, 1
1, Red, 2
1, Yellow, 2
1, Blue, 2
2, Red, 3
2, Yellow, 3
2, Blue, 3
3, Red, 4
3, Yellow, 4
3, Blue, 4
3, Red, 5
3, Yellow, 5
3, Blue, 5
Thank you in advance, hopefully, anyone can help.
CodePudding user response:
We may use
library(dplyr)
library(zoo)
df1 %>%
group_by(ID) %>%
slice(sort(unique(rep(which(rollapply(Colour, width = 3,
FUN = toString) == "Red, Yellow, Blue"), each = 3) 0:2))) %>%
ungroup %>%
mutate(Index = cumsum(Colour == "Red"))
-output
# A tibble: 15 × 3
ID Colour Index
<int> <chr> <int>
1 1 Red 1
2 1 Yellow 1
3 1 Blue 1
4 1 Red 2
5 1 Yellow 2
6 1 Blue 2
7 2 Red 3
8 2 Yellow 3
9 2 Blue 3
10 3 Red 4
11 3 Yellow 4
12 3 Blue 4
13 3 Red 5
14 3 Yellow 5
15 3 Blue 5
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L), Colour = c("Red", "Yellow", "Blue", "Green", "Pink",
"Red", "Yellow", "Blue", "Red", "Red", "Red", "Yellow", "Blue",
"Blue", "Yellow", "Red", "Blue", "Red", "Red", "Blue", "Blue",
"Red", "Red", "Yellow", "Blue", "Red", "Yellow", "Blue")), row.names = c(NA,
-28L), class = "data.frame")
CodePudding user response:
Here is a data.table approach, that uses Reduce
and shift
to find the IDs that have the pattern
library(data.table)
out = setDT(d)[Reduce(paste, shift(Colour,0:-2))=="Red Yellow Blue"]
out = out[,index:=.I][rep(1:.N,3)][, Colour:=rep(c("Red", "Yellow", "Blue"), length.out=.N)]
ID Colour index
<int> <char> <int>
1: 1 Red 1
2: 1 Yellow 2
3: 2 Blue 3
4: 3 Red 4
5: 3 Yellow 5
6: 1 Blue 1
7: 1 Red 2
8: 2 Yellow 3
9: 3 Blue 4
10: 3 Red 5
11: 1 Yellow 1
12: 1 Blue 2
13: 2 Red 3
14: 3 Yellow 4
15: 3 Blue 5
Another good way to do this is create a function, f
that creates the output, given some ids
f <- function(id) {
data.table(
ID = rep(id,each=3),
Colour=rep(c("Red", "Yellow","Blue"),times=length(id)),
index=rep(1:length(id), times=3))
}
Just get the ids, using the above strategy
ids = setDT(d)[Reduce(paste, shift(Colour,0:-2))=="Red Yellow Blue"]$ID
and apply to the function
f(ids)
ID Colour index
<int> <char> <int>
1: 1 Red 1
2: 1 Yellow 2
3: 1 Blue 3
4: 1 Red 4
5: 1 Yellow 5
6: 1 Blue 1
7: 2 Red 2
8: 2 Yellow 3
9: 2 Blue 4
10: 3 Red 5
11: 3 Yellow 1
12: 3 Blue 2
13: 3 Red 3
14: 3 Yellow 4
15: 3 Blue 5