I asked a question a few months back about how to identify and keep only observations that follow a certain pattern: How can I identify patterns over several rows in a column and fill a new column with information about that pattern using R?
I want to take this a step further. In that question I just wanted to identify that pattern. Now, if the pattern appears several times within a group, how I keep only the last occurance of that pattern. For example, given df1
how can I achieve df2
df1
TIME ID D
12:30:10 2 0
12:30:42 2 0
12:30:59 2 1
12:31:20 2 0
12:31:50 2 0
12:32:11 2 0
12:32:45 2 1
12:33:10 2 1
12:33:33 2 1
12:33:55 2 1
12:34:15 2 0
12:34:30 2 0
12:35:30 2 0
12:36:30 2 0
12:36:45 2 0
12:37:00 2 0
12:38:00 2 1
I want to end up with the following df2
df2
TIME ID D
12:33:55 2 1
12:34:15 2 0
12:34:30 2 0
12:35:30 2 0
12:36:30 2 0
12:36:45 2 0
12:37:00 2 0
12:38:00 2 1
Thoughts? There were some helpful answers in the question I linked above, but I now want to narrow it.
CodePudding user response:
Here is a base R function I find too complicated but that gets what is asked for.
If I understood the pattern correctly, it doesn't matter if the last sequence ends in a 1 or a 0. The test with df1b
has a last sequence ending in a 0.
keep_last_pattern <- function(data, col){
x <- data[[col]]
if(x[length(x)] == 0) x[length(x)] <- 1
#
i <- ave(x, cumsum(x), FUN = \(y) y[1] == 1 & length(y) > 1)
r <- rle(i)
l <- length(r$lengths)
n <- which(as.logical(r$values))
r$values[ n[-length(n)] ] <- 0
r$values[l] <- r$lengths[l] == 1 && r$values[l] == 0
j <- as.logical(inverse.rle(r))
#
data[j, ]
}
keep_last_pattern(df1, "D")
df1b <- df1
df1b[17, "D"] <- 0
keep_last_pattern(df1b, "D")
CodePudding user response:
Do you want to rows the sequence in each ID
between second last 1 and last 1 ?
Here is a function to do that which can be applied for each ID
.
library(dplyr)
extract_sequence <- function(x) {
inds <- which(x == 1)
inds[length(inds) - 1]:inds[length(inds)]
}
df %>%
group_by(ID) %>%
slice(extract_sequence(D)) %>%
ungroup
# TIME ID D
# <chr> <int> <int>
#1 12:33:55 2 1
#2 12:34:15 2 0
#3 12:34:30 2 0
#4 12:35:30 2 0
#5 12:36:30 2 0
#6 12:36:45 2 0
#7 12:37:00 2 0
#8 12:38:00 2 1
CodePudding user response:
Not sure this will help as it's unclear what your pattern is. Let's assume you have data like this, with one column indicating in some way whether the row matches a pattern or not:
set.seed(123)
df <- data.frame(
grp = sample(LETTERS[1:3], 10, replace = TRUE),
x = 1:10,
y = c(0,1,0,0,1,1,1,1,1,1),
pattern = rep(c("TRUE", "FALSE"),5)
)
If the aim is to keep only the last occurrence of pattern == "TRUE"
per group, this might work:
df %>%
filter(pattern == "TRUE") %>%
group_by(grp) %>%
slice_tail(.)
# A tibble: 3 x 4
# Groups: grp [3]
grp x y pattern
<chr> <int> <dbl> <chr>
1 A 1 0 TRUE
2 B 9 1 TRUE
3 C 5 1 TRUE